Develop a revised version of GraphSON that provides better support for non-JVM languages that consume it.
Issue Links
GitHub user newkek opened a pull request:
TINKERPOP-1274: GraphSON 2.0.
- Summary of the changes :
Implementation of a format for value types serialization which is uniform and not Java centric. As a reminder the new format is as follows :
- A value not typed : `value`
- A value typed : `[
, value]`
The content of `value` can be either a simple value, or a more complex structure. The type prefix allows to call the right Deserializer to deserialize the `value` whatever its content is.
The default's GraphSON's 2.0 format will include types for every `value` that is not a JSON native type (`String/int/double/null/boolean/Map/Collection`) - called `PARTIAL_TYPES`. This allows to significantly reduce the size of the JSON payload where those aren't needed by the Jackson library.
GraphSON serialization without types is not affected. To enable it, need to use `NO_TYPES`.
- Quick walkthrough for the review
There are new components involved in the ser/de process that are extended from the Jackson library :
- TypeIdResolver : performs the conversion of a object's `Class` -> `typeID`, and from a `typeID` -> `Class`.
- TypeResolverBuilder : creates a TypeIdResolver and instantiates a TypeDeserializer or TypeSerializer with the TypeIdResolver as a param.
- TypeSerializer : writes the prefix and suffix for a typeID. The TypeSerializer is provided to the Serializers and handles the type serialization that respects the format put in place.
- TypeDeserializer : is called before the deserializers are called. Its role is to detect a type, and call the right Deserializer to deserialize the value.
Most of the serializers already existing for Graph had to be modified, because most of them were manually hardcoding types without calling the TypeSerializer and hence, those were not respecting the format. Now all Serializers respect the format because they call the TypeSerializer given in parameter. I followed the frame put in place by @spmallette to implement those new serializers (prefixed with `V2d0`) without breaking existing clients code.
In TypeDeserializer, `baseType` represents the class given in parameter by the user for the ser/de. If the user calls `mapper.readValueAsString(jsonString, UUID.class)` the `baseType` in TypeDeserializer will be a JavaType (which is the Jackson's custom class for Java classes) that represents a UUID class. The wildcard in our deserialization mechanism is `Object.class`.
The deserialization path is as follow :
- TypeDeserializer is called only for non simple JSON values (non `String/int/double/null/boolean`).
- When called, if a type is detected, the TypeDeserializer will read the typeID, convert it to a JavaType (thanks to the TypeIdResolver), check that the `baseType` is the same than what was read in the payload (only if `baseType` is not the wildcard), and call the `deserialize()` method of the JsonDeserializer registered for that type.
- If a type is not detected, detect Maps or Arrays and call appropriate Deserializer.
- Some results
GraphSON 2.0 shows a significant reduction of the payload's size for typed serialization. And the consequence in performance is that since there's less to process, the ser/de is faster. Results show a reduction of at least 50% in the payload's size and evolving linearly (the bigger the payload the bigger the difference) :
➜ ls -lh tinkergraph-gremlin/target/test-case-data/TinkerGraphTest/tinkerpop-io/
rw-rr- 1 kevingallardo staff 890K 28 Jun 21:50 grateful-dead-V2d0-typed.json
rw-rr- 1 kevingallardo staff 1.9M 28 Jun 21:50 grateful-dead-typed.json
rw-rr- 1 kevingallardo staff 851K 28 Jun 21:50 grateful-dead.json
rw-rr- 1 kevingallardo staff 1.5K 28 Jun 21:50 tinkerpop-classic-V2d0-typed.json
rw-rr- 1 kevingallardo staff 3.6K 28 Jun 21:50 tinkerpop-classic-typed.json
rw-rr- 1 kevingallardo staff 1.3K 28 Jun 21:50 tinkerpop-classic.json
- Tests
Tests cover the same functionalities covered by GraphSON 1.0 typed, plus additional features brought by GraphSON 2.0.
- Tradeoffs
- Some tricks were implemented in order to provide the types without packages names. Since Java does not provide a way to search a class by its simple name, the `TypeIdResolver` has to have an index that it can refer to, and that has been correctly filled. The TinkerpopJackosnModule class will handle providing those new indexes to the TypeIdResolver when custom Deserializers are added, but without doing that we have no way to properly convert a type. We then require that a user instead of extending Jackson's SimpleModule when writing a module, extends TinkerpopJacksonModule which is an extension the Jackson SimpleModule. We can discuss whether this is a showstopper or if it is acceptable. Some solutions to prevent that, could be using a external library that allows searching a Class by its simple name, or switch to writing typeIDs with the full Class's canonical name. I don't have a strong opinion but it seems like the current solution is simple and good enough.
- We loose some polymorphism potential in regards to POJOs without deserializers. Since the approach here is simpler and faster, it doesn't inspects JavaTypes to find potential subclasses. On the filp side it allows fine tuning of the Deserializers chosen to deserialize a type. I also thought about making the TypeIdResolver exposed to users through the GraphSONMapper for full customization of the types ser/de. Could be done easily later.
- There is 1 situation that can lead to unexpected results : if somebody serializes a `value` that has the exact same format as the type format. I.e. a List in which the first element is a Map in which the first entry's key is GraphSONTokens.CLASS. We may want to highlight that somewhere.
- Extra facts
- The class JsonParserConcat allows some great perf improvements in the deser path and has been implemented and inspired by the JsonParserSequence class from Jackson, however the Jackson's class has an unexpected behaviour, which is corrected in JsonParserConcat. JsonParserSequence may be corrected in future Jackson versions but in the meantime JsonParserConcat will do fine. (
- `FULL_TYPES` has not been implemented. FULL_TYPES means writing types for JSON natively supported values. However all the mechanism is put in place to implement it easily if we deem it necessary. I don't see the necessity for this now.
You can merge this pull request into a Git repository by running:
$ git pull TINKERPOP-1274
Alternatively you can review and apply these changes as the patch at:
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #351
commit f5b64fafb071fffc60fdb5113709c78f5dde334f
Author: Stephen Mallette <>
Date: 2016-05-18T12:41:26Z
Frame up for GraphSON 2.0
commit 0b805d600b9b11b0b36bb34ebfe4c880615e7764
Author: Kevin Gallardo <>
Date: 2016-06-28T18:38:32Z
TINKERPOP-1274: GraphSON 2.0.
Github user spmallette commented on a diff in the pull request:
— Diff: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphson/ —
@@ -22,11 +22,7 @@
— End diff –
sorry @newkek but your IDE introduced wildcards to our imports which isn't our code style. could you please fix those?
Github user spmallette commented on a diff in the pull request:
— Diff: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphson/ —
@@ -60,47 +61,78 @@
private final boolean normalize;
private final boolean embedTypes;
private final GraphSONVersion version;
+ private final TypeInfo typeInfo;
private GraphSONMapper(final List<SimpleModule> customModules, final boolean loadCustomSerializers,
— End diff –
Could you please make the constructor match this approach:
rather than pass the individual parameters we just pass the `Bulider` object. Not sure why this wasn't changed already for `GraphSONMapper`.....
Github user spmallette commented on a diff in the pull request:
— Diff: gremlin-driver/src/test/java/org/apache/tinkerpop/gremlin/driver/ser/ —
@@ -0,0 +1,474 @@
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.tinkerpop.gremlin.driver.ser;
+import org.apache.tinkerpop.gremlin.driver.message.RequestMessage;
+import org.apache.tinkerpop.gremlin.driver.message.ResponseMessage;
+import org.apache.tinkerpop.gremlin.driver.message.ResponseStatusCode;
+import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversalSource;
+import org.apache.tinkerpop.gremlin.process.traversal.step.util.Tree;
+import org.apache.tinkerpop.gremlin.structure.*;
+import org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerFactory;
+import org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph;
+import org.apache.tinkerpop.gremlin.util.iterator.IteratorUtils;
+import org.apache.tinkerpop.shaded.jackson.core.JsonGenerationException;
+import org.apache.tinkerpop.shaded.jackson.core.JsonGenerator;
+import org.apache.tinkerpop.shaded.jackson.databind.JsonNode;
+import org.apache.tinkerpop.shaded.jackson.databind.ObjectMapper;
+import org.apache.tinkerpop.shaded.jackson.databind.SerializerProvider;
+import org.apache.tinkerpop.shaded.jackson.databind.module.SimpleModule;
+import org.apache.tinkerpop.shaded.jackson.databind.node.NullNode;
+import org.apache.tinkerpop.shaded.jackson.databind.ser.std.StdSerializer;
+import org.apache.tinkerpop.shaded.jackson.databind.util.StdDateFormat;
+import org.junit.Test;
+import java.awt.*;
+import java.util.*;
+import static org.hamcrest.MatcherAssert.assertThat;
+import static;
+import static org.junit.Assert.*;
+ * These tests focus on message serialization and not "result" serialization as test specific to results (e.g.
+ * vertices, edges, annotated values, etc.) are handled in the IO packages.
+ *
+ * @author Stephen Mallette (
+ */
+public class GraphSONMessageSerializerV2d0Test {
— End diff –
Are the tests like this one exact copies of the v1 versions? Is it possible to use a:
here and parameterize on the serializer version?
Github user spmallette commented on a diff in the pull request:
— Diff: tinkergraph-gremlin/src/test/java/org/apache/tinkerpop/gremlin/tinkergraph/structure/ —
@@ -275,4 +290,78 @@ else if (e.label().equals("writtenBy")), g);
+ @Test
+ public void shouldWriteGratefulDeadGraphSONV2d0() throws IOException {
+ final TinkerGraph g =;
+ final TinkerGraph readG =;
+ final GraphReader reader =;
+ try (final InputStream stream = AbstractGremlinTest.class.getResourceAsStream("/org/apache/tinkerpop/gremlin/structure/io/gryo/grateful-dead.kryo"))
+ final OutputStream os2 = new FileOutputStream(tempPath + "grateful-dead-V2d0-typed.json");
+, g);
+ os2.close();
+ final InputStream is = new FileInputStream(tempPath + "grateful-dead-V2d0-typed.json");
+, readG);
+ is.close();
+ assertEquals(approximateGraphsCheck(g, readG), true);
+ }
+ /**
+ * Checks sequentially vertices and egdes of both graphs. Will check sequentially Vertex IDs, Vertex Properties IDs
+ * and values and classes. Then same for edges. To use when serializing a Graph and deserializing the supposedly
+ * same Graph.
+ */
+ private boolean approximateGraphsCheck(Graph g1, Graph g2) {
— End diff –
These are good tests, but they don't belong in `IoDataGenerationTest`. I think you could move them somewhere else. The point of this test is to generate the sample data that we ship with our distributions and use in tests. You should include "tests" that generate 2.0 versions of grateful-dead, classic, modern and crew graphs. Then, to generate the data, just run:
cd tinkergraph-gremlin
mvn clean install -Dio
You should see the new files in the appropriate places. I guess for now they should be named with a suffix like:
Let's leave the 1.0 naming as-is for now so as not to break anything existing that relies on the file names being what they are. I think we had also said that we would keep 1.0 as the default for now and then look to 3.3.x to make 2.0 the default - that would produce the least amount of breaking change, so i guess that approach would be in line with that thinking. Does that make sense?
Github user spmallette commented on the issue:
Note that travis isn't happy - tests are failing:
Failed tests:
GraphSONMessageSerializerV2d0Test.shouldSerializeToJsonIteratorNullElement:147 expected:<[x]> but was:<[]>
GraphSONMessageSerializerV2d0Test.shouldSerializeToJsonMap:171 expected:<[x]> but was:<[]>
GraphSONMessageSerializerV2d0Test.shouldSerializeToJsonIterable:107 expected:<[x]> but was:<[]>
GraphSONMessageSerializerV2d0Test.shouldSerializeToJsonIterator:126 expected:<[x]> but was:<[]>
Also - some documentation odds and ends:
- Please update CHANGELOG with an entry for "Introduced GraphSON 2.0".
- We probably need o include some reference documentation [here]( - not sure what that should look like at the moment. We default to 1.0 still for now, so we probably don't need to change too much. Perhaps we need a section that shows what 2.0 format looks like and how to generate it? Maybe that is enough?
- We need upgrade docs for users so they are aware of this new feature - [here](
- We need upgrade docs for driver devs (need to add a section - no changes in 3.2.1 have affected drivers yet) so they are aware of how they might benefit from this change - [here](
Github user newkek commented on the issue:
> tests are failing
Ah I changed something quickly this morning and did not run again those tests. Will correct that.
Ok for the docs, will do.
Github user newkek commented on a diff in the pull request:
— Diff: gremlin-driver/src/test/java/org/apache/tinkerpop/gremlin/driver/ser/ —
@@ -0,0 +1,474 @@
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.tinkerpop.gremlin.driver.ser;
+import org.apache.tinkerpop.gremlin.driver.message.RequestMessage;
+import org.apache.tinkerpop.gremlin.driver.message.ResponseMessage;
+import org.apache.tinkerpop.gremlin.driver.message.ResponseStatusCode;
+import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversalSource;
+import org.apache.tinkerpop.gremlin.process.traversal.step.util.Tree;
+import org.apache.tinkerpop.gremlin.structure.*;
+import org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerFactory;
+import org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph;
+import org.apache.tinkerpop.gremlin.util.iterator.IteratorUtils;
+import org.apache.tinkerpop.shaded.jackson.core.JsonGenerationException;
+import org.apache.tinkerpop.shaded.jackson.core.JsonGenerator;
+import org.apache.tinkerpop.shaded.jackson.databind.JsonNode;
+import org.apache.tinkerpop.shaded.jackson.databind.ObjectMapper;
+import org.apache.tinkerpop.shaded.jackson.databind.SerializerProvider;
+import org.apache.tinkerpop.shaded.jackson.databind.module.SimpleModule;
+import org.apache.tinkerpop.shaded.jackson.databind.node.NullNode;
+import org.apache.tinkerpop.shaded.jackson.databind.ser.std.StdSerializer;
+import org.apache.tinkerpop.shaded.jackson.databind.util.StdDateFormat;
+import org.junit.Test;
+import java.awt.*;
+import java.util.*;
+import static org.hamcrest.MatcherAssert.assertThat;
+import static;
+import static org.junit.Assert.*;
+ * These tests focus on message serialization and not "result" serialization as test specific to results (e.g.
+ * vertices, edges, annotated values, etc.) are handled in the IO packages.
+ *
+ * @author Stephen Mallette (
+ */
+public class GraphSONMessageSerializerV2d0Test {
— End diff –
They are not all exactly the same, because some of these inspect the generated JSON payload manually (`shouldSerializeEdge()`) and since the format is different, and that types are introduced now, they don't get the expected property at the right place. [see here]( (l.230) since the payload contains the type, we have to go in the Array's second element to get the value.
Github user newkek commented on a diff in the pull request:
— Diff: gremlin-driver/src/test/java/org/apache/tinkerpop/gremlin/driver/ser/ —
@@ -0,0 +1,474 @@
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.tinkerpop.gremlin.driver.ser;
+import org.apache.tinkerpop.gremlin.driver.message.RequestMessage;
+import org.apache.tinkerpop.gremlin.driver.message.ResponseMessage;
+import org.apache.tinkerpop.gremlin.driver.message.ResponseStatusCode;
+import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversalSource;
+import org.apache.tinkerpop.gremlin.process.traversal.step.util.Tree;
+import org.apache.tinkerpop.gremlin.structure.*;
+import org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerFactory;
+import org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph;
+import org.apache.tinkerpop.gremlin.util.iterator.IteratorUtils;
+import org.apache.tinkerpop.shaded.jackson.core.JsonGenerationException;
+import org.apache.tinkerpop.shaded.jackson.core.JsonGenerator;
+import org.apache.tinkerpop.shaded.jackson.databind.JsonNode;
+import org.apache.tinkerpop.shaded.jackson.databind.ObjectMapper;
+import org.apache.tinkerpop.shaded.jackson.databind.SerializerProvider;
+import org.apache.tinkerpop.shaded.jackson.databind.module.SimpleModule;
+import org.apache.tinkerpop.shaded.jackson.databind.node.NullNode;
+import org.apache.tinkerpop.shaded.jackson.databind.ser.std.StdSerializer;
+import org.apache.tinkerpop.shaded.jackson.databind.util.StdDateFormat;
+import org.junit.Test;
+import java.awt.*;
+import java.util.*;
+import static org.hamcrest.MatcherAssert.assertThat;
+import static;
+import static org.junit.Assert.*;
+ * These tests focus on message serialization and not "result" serialization as test specific to results (e.g.
+ * vertices, edges, annotated values, etc.) are handled in the IO packages.
+ *
+ * @author Stephen Mallette (
+ */
+public class GraphSONMessageSerializerV2d0Test {
— End diff –
I could change some of the tests to take into account the version otherwise
Github user spmallette commented on a diff in the pull request:
— Diff: gremlin-driver/src/test/java/org/apache/tinkerpop/gremlin/driver/ser/ —
@@ -0,0 +1,474 @@
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.tinkerpop.gremlin.driver.ser;
+import org.apache.tinkerpop.gremlin.driver.message.RequestMessage;
+import org.apache.tinkerpop.gremlin.driver.message.ResponseMessage;
+import org.apache.tinkerpop.gremlin.driver.message.ResponseStatusCode;
+import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversalSource;
+import org.apache.tinkerpop.gremlin.process.traversal.step.util.Tree;
+import org.apache.tinkerpop.gremlin.structure.*;
+import org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerFactory;
+import org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph;
+import org.apache.tinkerpop.gremlin.util.iterator.IteratorUtils;
+import org.apache.tinkerpop.shaded.jackson.core.JsonGenerationException;
+import org.apache.tinkerpop.shaded.jackson.core.JsonGenerator;
+import org.apache.tinkerpop.shaded.jackson.databind.JsonNode;
+import org.apache.tinkerpop.shaded.jackson.databind.ObjectMapper;
+import org.apache.tinkerpop.shaded.jackson.databind.SerializerProvider;
+import org.apache.tinkerpop.shaded.jackson.databind.module.SimpleModule;
+import org.apache.tinkerpop.shaded.jackson.databind.node.NullNode;
+import org.apache.tinkerpop.shaded.jackson.databind.ser.std.StdSerializer;
+import org.apache.tinkerpop.shaded.jackson.databind.util.StdDateFormat;
+import org.junit.Test;
+import java.awt.*;
+import java.util.*;
+import static org.hamcrest.MatcherAssert.assertThat;
+import static;
+import static org.junit.Assert.*;
+ * These tests focus on message serialization and not "result" serialization as test specific to results (e.g.
+ * vertices, edges, annotated values, etc.) are handled in the IO packages.
+ *
+ * @author Stephen Mallette (
+ */
+public class GraphSONMessageSerializerV2d0Test {
— End diff –
ok - fair enough. if they aren't the same then no worries - we just have more tests to maintain. thanks
Github user newkek commented on a diff in the pull request:
— Diff: tinkergraph-gremlin/src/test/java/org/apache/tinkerpop/gremlin/tinkergraph/structure/ —
@@ -275,4 +290,78 @@ else if (e.label().equals("writtenBy")), g);
+ @Test
+ public void shouldWriteGratefulDeadGraphSONV2d0() throws IOException {
+ final TinkerGraph g =;
+ final TinkerGraph readG =;
+ final GraphReader reader =;
+ try (final InputStream stream = AbstractGremlinTest.class.getResourceAsStream("/org/apache/tinkerpop/gremlin/structure/io/gryo/grateful-dead.kryo"))
+ final OutputStream os2 = new FileOutputStream(tempPath + "grateful-dead-V2d0-typed.json");
+, g);
+ os2.close();
+ final InputStream is = new FileInputStream(tempPath + "grateful-dead-V2d0-typed.json");
+, readG);
+ is.close();
+ assertEquals(approximateGraphsCheck(g, readG), true);
+ }
+ /**
+ * Checks sequentially vertices and egdes of both graphs. Will check sequentially Vertex IDs, Vertex Properties IDs
+ * and values and classes. Then same for edges. To use when serializing a Graph and deserializing the supposedly
+ * same Graph.
+ */
+ private boolean approximateGraphsCheck(Graph g1, Graph g2) {
— End diff –
So, write new tests in IoDataGenerationTest class that will generate V2.0 versions of the classic/modern/etc.. graphs in json serialized format (with the files named `[graph-name]-v2d0.json`).
And put the Ser/deser test of the grateful-dead graph in another class.
Correct ?
Github user spmallette commented on a diff in the pull request:
— Diff: tinkergraph-gremlin/src/test/java/org/apache/tinkerpop/gremlin/tinkergraph/structure/ —
@@ -275,4 +290,78 @@ else if (e.label().equals("writtenBy")), g);
+ @Test
+ public void shouldWriteGratefulDeadGraphSONV2d0() throws IOException {
+ final TinkerGraph g =;
+ final TinkerGraph readG =;
+ final GraphReader reader =;
+ try (final InputStream stream = AbstractGremlinTest.class.getResourceAsStream("/org/apache/tinkerpop/gremlin/structure/io/gryo/grateful-dead.kryo"))
+ final OutputStream os2 = new FileOutputStream(tempPath + "grateful-dead-V2d0-typed.json");
+, g);
+ os2.close();
+ final InputStream is = new FileInputStream(tempPath + "grateful-dead-V2d0-typed.json");
+, readG);
+ is.close();
+ assertEquals(approximateGraphsCheck(g, readG), true);
+ }
+ /**
+ * Checks sequentially vertices and egdes of both graphs. Will check sequentially Vertex IDs, Vertex Properties IDs
+ * and values and classes. Then same for edges. To use when serializing a Graph and deserializing the supposedly
+ * same Graph.
+ */
+ private boolean approximateGraphsCheck(Graph g1, Graph g2) {
— End diff –
yes - that's correct - thanks. you wrote it more clearly than i did
Github user newkek commented on the issue:
Yes that was introduced by the ce19704 (reference docs and Changelog)
Github user newkek commented on the issue:
- I've written all the docs, please don't hesitate to correct them if they're not written well.
- Updated the tests, parameterized them as much as possible, and focused the 2.0 tests on the 2.0 specific functionalities.
- Rebased on current `master`.
Github user newkek commented on the issue:
Re-conflicts with Changelog against master..
Github user spmallette commented on the issue:
yeah - we tend to bump heads on master a little bit on the CHANGELOG - not a big deal - i can resolve that conflict when the time comes.
Github user spmallette commented on a diff in the pull request:
— Diff: tinkergraph-gremlin/src/test/java/org/apache/tinkerpop/gremlin/tinkergraph/structure/ —
@@ -275,4 +290,78 @@ else if (e.label().equals("writtenBy")), g);
+ @Test
+ public void shouldWriteGratefulDeadGraphSONV2d0() throws IOException {
+ final TinkerGraph g =;
+ final TinkerGraph readG =;
+ final GraphReader reader =;
+ try (final InputStream stream = AbstractGremlinTest.class.getResourceAsStream("/org/apache/tinkerpop/gremlin/structure/io/gryo/grateful-dead.kryo"))
+ final OutputStream os2 = new FileOutputStream(tempPath + "grateful-dead-V2d0-typed.json");
+, g);
+ os2.close();
+ final InputStream is = new FileInputStream(tempPath + "grateful-dead-V2d0-typed.json");
+, readG);
+ is.close();
+ assertEquals(approximateGraphsCheck(g, readG), true);
+ }
+ /**
+ * Checks sequentially vertices and egdes of both graphs. Will check sequentially Vertex IDs, Vertex Properties IDs
+ * and values and classes. Then same for edges. To use when serializing a Graph and deserializing the supposedly
+ * same Graph.
+ */
+ private boolean approximateGraphsCheck(Graph g1, Graph g2) {
— End diff –
did you run
cd tinkergraph-gremlin
mvn clean install -Dio
I don't see the newly generated data files in the various data directories.
Github user newkek commented on a diff in the pull request:
— Diff: tinkergraph-gremlin/src/test/java/org/apache/tinkerpop/gremlin/tinkergraph/structure/ —
@@ -275,4 +290,78 @@ else if (e.label().equals("writtenBy")), g);
+ @Test
+ public void shouldWriteGratefulDeadGraphSONV2d0() throws IOException {
+ final TinkerGraph g =;
+ final TinkerGraph readG =;
+ final GraphReader reader =;
+ try (final InputStream stream = AbstractGremlinTest.class.getResourceAsStream("/org/apache/tinkerpop/gremlin/structure/io/gryo/grateful-dead.kryo"))
+ final OutputStream os2 = new FileOutputStream(tempPath + "grateful-dead-V2d0-typed.json");
+, g);
+ os2.close();
+ final InputStream is = new FileInputStream(tempPath + "grateful-dead-V2d0-typed.json");
+, readG);
+ is.close();
+ assertEquals(approximateGraphsCheck(g, readG), true);
+ }
+ /**
+ * Checks sequentially vertices and egdes of both graphs. Will check sequentially Vertex IDs, Vertex Properties IDs
+ * and values and classes. Then same for edges. To use when serializing a Graph and deserializing the supposedly
+ * same Graph.
+ */
+ private boolean approximateGraphsCheck(Graph g1, Graph g2) {
— End diff –
Where are they supposed to be located ? I see them in the `tinkergraph-gremlin/target/test-case-data/TinkerGraphTest/tinkerpop-io/`. Should they be somewhere else?
Github user newkek commented on a diff in the pull request:
— Diff: tinkergraph-gremlin/src/test/java/org/apache/tinkerpop/gremlin/tinkergraph/structure/ —
@@ -275,4 +290,78 @@ else if (e.label().equals("writtenBy")), g);
+ @Test
+ public void shouldWriteGratefulDeadGraphSONV2d0() throws IOException {
+ final TinkerGraph g =;
+ final TinkerGraph readG =;
+ final GraphReader reader =;
+ try (final InputStream stream = AbstractGremlinTest.class.getResourceAsStream("/org/apache/tinkerpop/gremlin/structure/io/gryo/grateful-dead.kryo"))
+ final OutputStream os2 = new FileOutputStream(tempPath + "grateful-dead-V2d0-typed.json");
+, g);
+ os2.close();
+ final InputStream is = new FileInputStream(tempPath + "grateful-dead-V2d0-typed.json");
+, readG);
+ is.close();
+ assertEquals(approximateGraphsCheck(g, readG), true);
+ }
+ /**
+ * Checks sequentially vertices and egdes of both graphs. Will check sequentially Vertex IDs, Vertex Properties IDs
+ * and values and classes. Then same for edges. To use when serializing a Graph and deserializing the supposedly
+ * same Graph.
+ */
+ private boolean approximateGraphsCheck(Graph g1, Graph g2) {
— End diff –
Github user newkek commented on a diff in the pull request:
— Diff: tinkergraph-gremlin/src/test/java/org/apache/tinkerpop/gremlin/tinkergraph/structure/ —
@@ -275,4 +290,78 @@ else if (e.label().equals("writtenBy")), g);
+ @Test
+ public void shouldWriteGratefulDeadGraphSONV2d0() throws IOException {
+ final TinkerGraph g =;
+ final TinkerGraph readG =;
+ final GraphReader reader =;
+ try (final InputStream stream = AbstractGremlinTest.class.getResourceAsStream("/org/apache/tinkerpop/gremlin/structure/io/gryo/grateful-dead.kryo"))
+ final OutputStream os2 = new FileOutputStream(tempPath + "grateful-dead-V2d0-typed.json");
+, g);
+ os2.close();
+ final InputStream is = new FileInputStream(tempPath + "grateful-dead-V2d0-typed.json");
+, readG);
+ is.close();
+ assertEquals(approximateGraphsCheck(g, readG), true);
+ }
+ /**
+ * Checks sequentially vertices and egdes of both graphs. Will check sequentially Vertex IDs, Vertex Properties IDs
+ * and values and classes. Then same for edges. To use when serializing a Graph and deserializing the supposedly
+ * same Graph.
+ */
+ private boolean approximateGraphsCheck(Graph g1, Graph g2) {
— End diff –
Oh I think I see what you mean, I don't see them in the root's `data/` dir, I don't know why
Github user spmallette commented on a diff in the pull request:
— Diff: tinkergraph-gremlin/src/test/java/org/apache/tinkerpop/gremlin/tinkergraph/structure/ —
@@ -275,4 +290,78 @@ else if (e.label().equals("writtenBy")), g);
+ @Test
+ public void shouldWriteGratefulDeadGraphSONV2d0() throws IOException {
+ final TinkerGraph g =;
+ final TinkerGraph readG =;
+ final GraphReader reader =;
+ try (final InputStream stream = AbstractGremlinTest.class.getResourceAsStream("/org/apache/tinkerpop/gremlin/structure/io/gryo/grateful-dead.kryo"))
+ final OutputStream os2 = new FileOutputStream(tempPath + "grateful-dead-V2d0-typed.json");
+, g);
+ os2.close();
+ final InputStream is = new FileInputStream(tempPath + "grateful-dead-V2d0-typed.json");
+, readG);
+ is.close();
+ assertEquals(approximateGraphsCheck(g, readG), true);
+ }
+ /**
+ * Checks sequentially vertices and egdes of both graphs. Will check sequentially Vertex IDs, Vertex Properties IDs
+ * and values and classes. Then same for edges. To use when serializing a Graph and deserializing the supposedly
+ * same Graph.
+ */
+ private boolean approximateGraphsCheck(Graph g1, Graph g2) {
— End diff –
When you run with `-Dio` it's suppose to copy the files to all the right places. See the tinkergraph-gremlin pom.xml....not sure why that's not happening.
Github user newkek commented on a diff in the pull request:
— Diff: tinkergraph-gremlin/src/test/java/org/apache/tinkerpop/gremlin/tinkergraph/structure/ —
@@ -275,4 +290,78 @@ else if (e.label().equals("writtenBy")), g);
+ @Test
+ public void shouldWriteGratefulDeadGraphSONV2d0() throws IOException {
+ final TinkerGraph g =;
+ final TinkerGraph readG =;
+ final GraphReader reader =;
+ try (final InputStream stream = AbstractGremlinTest.class.getResourceAsStream("/org/apache/tinkerpop/gremlin/structure/io/gryo/grateful-dead.kryo"))
+ final OutputStream os2 = new FileOutputStream(tempPath + "grateful-dead-V2d0-typed.json");
+, g);
+ os2.close();
+ final InputStream is = new FileInputStream(tempPath + "grateful-dead-V2d0-typed.json");
+, readG);
+ is.close();
+ assertEquals(approximateGraphsCheck(g, readG), true);
+ }
+ /**
+ * Checks sequentially vertices and egdes of both graphs. Will check sequentially Vertex IDs, Vertex Properties IDs
+ * and values and classes. Then same for edges. To use when serializing a Graph and deserializing the supposedly
+ * same Graph.
+ */
+ private boolean approximateGraphsCheck(Graph g1, Graph g2) {
— End diff –
Ah, looking at the pom, I'm getting this warning during the build :
[INFO] — maven-resources-plugin:2.6:copy-resources (copy-graphson-from-tmp-to-resources) @ tinkergraph-gremlin —
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /Users/kevingallardo/Documents/workspace/newkek-incubator-tinkerpop/tinkergraph-gremlin/target/tinkerpop-io
Seems like the pom is going to search for the ressources in `<io.tmp.dir>${}/tinkerpop-io</io.tmp.dir>` and `${}` is defined as `<directory>${basedir}/target</directory>`.
But the `tempPath` in IoDataGenerationTest` is `tempPath = TestHelper.makeTestDataPath(TinkerGraphTest.class, "tinkerpop-io").getPath() + File.separator;`
If I change the pom to search in the right directory : `<io.tmp.dir>${}/test-case-data/TinkerGraphTest/tinkerpop-io</io.tmp.dir>`, it works.
Github user newkek commented on a diff in the pull request:
— Diff: tinkergraph-gremlin/src/test/java/org/apache/tinkerpop/gremlin/tinkergraph/structure/ —
@@ -275,4 +290,78 @@ else if (e.label().equals("writtenBy")), g);
+ @Test
+ public void shouldWriteGratefulDeadGraphSONV2d0() throws IOException {
+ final TinkerGraph g =;
+ final TinkerGraph readG =;
+ final GraphReader reader =;
+ try (final InputStream stream = AbstractGremlinTest.class.getResourceAsStream("/org/apache/tinkerpop/gremlin/structure/io/gryo/grateful-dead.kryo"))
+ final OutputStream os2 = new FileOutputStream(tempPath + "grateful-dead-V2d0-typed.json");
+, g);
+ os2.close();
+ final InputStream is = new FileInputStream(tempPath + "grateful-dead-V2d0-typed.json");
+, readG);
+ is.close();
+ assertEquals(approximateGraphsCheck(g, readG), true);
+ }
+ /**
+ * Checks sequentially vertices and egdes of both graphs. Will check sequentially Vertex IDs, Vertex Properties IDs
+ * and values and classes. Then same for edges. To use when serializing a Graph and deserializing the supposedly
+ * same Graph.
+ */
+ private boolean approximateGraphsCheck(Graph g1, Graph g2) {
— End diff –
This is also broken on master
Github user spmallette commented on a diff in the pull request:
— Diff: tinkergraph-gremlin/src/test/java/org/apache/tinkerpop/gremlin/tinkergraph/structure/ —
@@ -275,4 +290,78 @@ else if (e.label().equals("writtenBy")), g);
+ @Test
+ public void shouldWriteGratefulDeadGraphSONV2d0() throws IOException {
+ final TinkerGraph g =;
+ final TinkerGraph readG =;
+ final GraphReader reader =;
+ try (final InputStream stream = AbstractGremlinTest.class.getResourceAsStream("/org/apache/tinkerpop/gremlin/structure/io/gryo/grateful-dead.kryo"))
+ final OutputStream os2 = new FileOutputStream(tempPath + "grateful-dead-V2d0-typed.json");
+, g);
+ os2.close();
+ final InputStream is = new FileInputStream(tempPath + "grateful-dead-V2d0-typed.json");
+, readG);
+ is.close();
+ assertEquals(approximateGraphsCheck(g, readG), true);
+ }
+ /**
+ * Checks sequentially vertices and egdes of both graphs. Will check sequentially Vertex IDs, Vertex Properties IDs
+ * and values and classes. Then same for edges. To use when serializing a Graph and deserializing the supposedly
+ * same Graph.
+ */
+ private boolean approximateGraphsCheck(Graph g1, Graph g2) {
— End diff –
hmm - strange that never was a problem before. something must have changed (we haven't run with -Dio in a long long time) feel free to fix it in this branch. thanks
Github user newkek commented on a diff in the pull request:
— Diff: tinkergraph-gremlin/src/test/java/org/apache/tinkerpop/gremlin/tinkergraph/structure/ —
@@ -275,4 +290,78 @@ else if (e.label().equals("writtenBy")), g);
+ @Test
+ public void shouldWriteGratefulDeadGraphSONV2d0() throws IOException {
+ final TinkerGraph g =;
+ final TinkerGraph readG =;
+ final GraphReader reader =;
+ try (final InputStream stream = AbstractGremlinTest.class.getResourceAsStream("/org/apache/tinkerpop/gremlin/structure/io/gryo/grateful-dead.kryo"))
+ final OutputStream os2 = new FileOutputStream(tempPath + "grateful-dead-V2d0-typed.json");
+, g);
+ os2.close();
+ final InputStream is = new FileInputStream(tempPath + "grateful-dead-V2d0-typed.json");
+, readG);
+ is.close();
+ assertEquals(approximateGraphsCheck(g, readG), true);
+ }
+ /**
+ * Checks sequentially vertices and egdes of both graphs. Will check sequentially Vertex IDs, Vertex Properties IDs
+ * and values and classes. Then same for edges. To use when serializing a Graph and deserializing the supposedly
+ * same Graph.
+ */
+ private boolean approximateGraphsCheck(Graph g1, Graph g2) {
— End diff –
Apparently the TestHelper behaviour has changed in commit da2eb7e, but the pom wasn't changed with regards to that change.
Ok I'll change the pom as explained earlier
Github user newkek commented on a diff in the pull request:
— Diff: tinkergraph-gremlin/src/test/java/org/apache/tinkerpop/gremlin/tinkergraph/structure/ —
@@ -275,4 +290,78 @@ else if (e.label().equals("writtenBy")), g);
+ @Test
+ public void shouldWriteGratefulDeadGraphSONV2d0() throws IOException {
+ final TinkerGraph g =;
+ final TinkerGraph readG =;
+ final GraphReader reader =;
+ try (final InputStream stream = AbstractGremlinTest.class.getResourceAsStream("/org/apache/tinkerpop/gremlin/structure/io/gryo/grateful-dead.kryo"))
+ final OutputStream os2 = new FileOutputStream(tempPath + "grateful-dead-V2d0-typed.json");
+, g);
+ os2.close();
+ final InputStream is = new FileInputStream(tempPath + "grateful-dead-V2d0-typed.json");
+, readG);
+ is.close();
+ assertEquals(approximateGraphsCheck(g, readG), true);
+ }
+ /**
+ * Checks sequentially vertices and egdes of both graphs. Will check sequentially Vertex IDs, Vertex Properties IDs
+ * and values and classes. Then same for edges. To use when serializing a Graph and deserializing the supposedly
+ * same Graph.
+ */
+ private boolean approximateGraphsCheck(Graph g1, Graph g2) {
— End diff –
So, since now the build creates the new v2d0 graphs, and it seems like the other ones are pushed in the repo, should I push the new ones too? Also, it seems that the already pushed ones have modifications. Should I push all of as well ? Here's the diff :
Github user spmallette commented on a diff in the pull request:
— Diff: tinkergraph-gremlin/src/test/java/org/apache/tinkerpop/gremlin/tinkergraph/structure/ —
@@ -275,4 +290,78 @@ else if (e.label().equals("writtenBy")), g);
+ @Test
+ public void shouldWriteGratefulDeadGraphSONV2d0() throws IOException {
+ final TinkerGraph g =;
+ final TinkerGraph readG =;
+ final GraphReader reader =;
+ try (final InputStream stream = AbstractGremlinTest.class.getResourceAsStream("/org/apache/tinkerpop/gremlin/structure/io/gryo/grateful-dead.kryo"))
+ final OutputStream os2 = new FileOutputStream(tempPath + "grateful-dead-V2d0-typed.json");
+, g);
+ os2.close();
+ final InputStream is = new FileInputStream(tempPath + "grateful-dead-V2d0-typed.json");
+, readG);
+ is.close();
+ assertEquals(approximateGraphsCheck(g, readG), true);
+ }
+ /**
+ * Checks sequentially vertices and egdes of both graphs. Will check sequentially Vertex IDs, Vertex Properties IDs
+ * and values and classes. Then same for edges. To use when serializing a Graph and deserializing the supposedly
+ * same Graph.
+ */
+ private boolean approximateGraphsCheck(Graph g1, Graph g2) {
— End diff –
yes - you should push the new ones. not sure what changed on the existing ones. can you tell what the changes are? if it is just the order of keys changing or the order of vertices or something like that then i wouldn't bother to push it, but if it's something else i would wonder if there is something wrong somewhere because we should be 100% backward compatible with GraphSON 1.0..........nothing should have changed in that sense.
Github user newkek commented on a diff in the pull request:
— Diff: tinkergraph-gremlin/src/test/java/org/apache/tinkerpop/gremlin/tinkergraph/structure/ —
@@ -275,4 +290,78 @@ else if (e.label().equals("writtenBy")), g);
+ @Test
+ public void shouldWriteGratefulDeadGraphSONV2d0() throws IOException {
+ final TinkerGraph g =;
+ final TinkerGraph readG =;
+ final GraphReader reader =;
+ try (final InputStream stream = AbstractGremlinTest.class.getResourceAsStream("/org/apache/tinkerpop/gremlin/structure/io/gryo/grateful-dead.kryo"))
+ final OutputStream os2 = new FileOutputStream(tempPath + "grateful-dead-V2d0-typed.json");
+, g);
+ os2.close();
+ final InputStream is = new FileInputStream(tempPath + "grateful-dead-V2d0-typed.json");
+, readG);
+ is.close();
+ assertEquals(approximateGraphsCheck(g, readG), true);
+ }
+ /**
+ * Checks sequentially vertices and egdes of both graphs. Will check sequentially Vertex IDs, Vertex Properties IDs
+ * and values and classes. Then same for edges. To use when serializing a Graph and deserializing the supposedly
+ * same Graph.
+ */
+ private boolean approximateGraphsCheck(Graph g1, Graph g2) {
— End diff –
I'll try to find the differences for the grateful-dead in json but I'm not sure I'd be able to tell concerning the diff for the `sample.kryo` as I'm not fluent in bianry (yet)
Github user spmallette commented on a diff in the pull request:
— Diff: tinkergraph-gremlin/src/test/java/org/apache/tinkerpop/gremlin/tinkergraph/structure/ —
@@ -275,4 +290,78 @@ else if (e.label().equals("writtenBy")), g);
+ @Test
+ public void shouldWriteGratefulDeadGraphSONV2d0() throws IOException {
+ final TinkerGraph g =;
+ final TinkerGraph readG =;
+ final GraphReader reader =;
+ try (final InputStream stream = AbstractGremlinTest.class.getResourceAsStream("/org/apache/tinkerpop/gremlin/structure/io/gryo/grateful-dead.kryo"))
+ final OutputStream os2 = new FileOutputStream(tempPath + "grateful-dead-V2d0-typed.json");
+, g);
+ os2.close();
+ final InputStream is = new FileInputStream(tempPath + "grateful-dead-V2d0-typed.json");
+, readG);
+ is.close();
+ assertEquals(approximateGraphsCheck(g, readG), true);
+ }
+ /**
+ * Checks sequentially vertices and egdes of both graphs. Will check sequentially Vertex IDs, Vertex Properties IDs
+ * and values and classes. Then same for edges. To use when serializing a Graph and deserializing the supposedly
+ * same Graph.
+ */
+ private boolean approximateGraphsCheck(Graph g1, Graph g2) {
— End diff –
If you re-run the tests after you generate and they all pass, i think we should probably be safe.
Github user PommeVerte commented on the issue:
Hey guys,
I've been super busy lately but I definitely plan on diving deep into this PR over the weekend. One quick remark though.
1. Even the JSON supported types are not prone to lossiness in multi language settings. They should also be typed.
2. In a multi language setting, having type names without their java classes is not helpful.
I can illustrate both of these points with the following JSON : `
The client assumes `id` is `int` but what exactly is an int? is it `16bit`, `32bit`, or `64bit`? Languages will have their own definition here. Actually some languages will even have different values of `int` depending on how they were compiled.
Changing it to `{"id":[
, 1]}` is not helpful in this case either. However the following is explicit and is something you can work with: `{"id":[
{"@class":"java.lang.Integer"}, 1]}`. It's immediately identifiable and well documented. The client knows this is a 32bit Int and can work accordingly. Without this you would have to go through documentation or code to figure out what you were dealing with.
In conclusion:
1. Thinking about it some more it's possible that Integer is the only special case that would need typing in the json supported types. I'll give it some more though. We could possibly have a "verbose" option for those who require typing of everything.
2. Type names should refer to the java class. This also seems to make sense when dealing with custom objects.
PS: I would also like to point out that this format `[
{"@class":"java.lang.Integer"}, 1]` can be a pain in systems that do not necessarily order lists. With these systems you need to check that your list has two elements, that one is a map, and that the map contains a `@class` key. Costly operation.
Perhaps `
` is a better option.
Github user newkek commented on a diff in the pull request:
— Diff: tinkergraph-gremlin/src/test/java/org/apache/tinkerpop/gremlin/tinkergraph/structure/ —
@@ -275,4 +290,78 @@ else if (e.label().equals("writtenBy")), g);
+ @Test
+ public void shouldWriteGratefulDeadGraphSONV2d0() throws IOException {
+ final TinkerGraph g =;
+ final TinkerGraph readG =;
+ final GraphReader reader =;
+ try (final InputStream stream = AbstractGremlinTest.class.getResourceAsStream("/org/apache/tinkerpop/gremlin/structure/io/gryo/grateful-dead.kryo"))
+ final OutputStream os2 = new FileOutputStream(tempPath + "grateful-dead-V2d0-typed.json");
+, g);
+ os2.close();
+ final InputStream is = new FileInputStream(tempPath + "grateful-dead-V2d0-typed.json");
+, readG);
+ is.close();
+ assertEquals(approximateGraphsCheck(g, readG), true);
+ }
+ /**
+ * Checks sequentially vertices and egdes of both graphs. Will check sequentially Vertex IDs, Vertex Properties IDs
+ * and values and classes. Then same for edges. To use when serializing a Graph and deserializing the supposedly
+ * same Graph.
+ */
+ private boolean approximateGraphsCheck(Graph g1, Graph g2) {
— End diff –
So to sum up on that samples issue :
- the grateful-dead V1 samples have changed because for some reason, some of the `inE` of some vertices were not written in the same order. I'm almost sure the fix here has nothing to do with that order change, so I pushed the changed ones. It also doesn't concern the `normalize` option of the GraphSON mapper, since the `inE` are generally not ordered. So, quite a mystery but I definitely don't that's something introduced by this branch.
- The `data/` folder now has the new `-v2d0` and `-v2d0-typed` graphs, but not the `normalized` ones.
- The `gremlin-test/src/main/ressources/etc...` has all the new v2 graphs and the normalized ones.
- I don't know what's up with the `sample.kryo` but there's near to 0.00001% chances it's related to that branch. But I pushed the change though.
Github user spmallette commented on the issue:
I thought that java integer would be implied. I believe that if we wrote a java short for example it would type to a "Short" then you would know what kind of number it is.
We purposely went away from the fully qualified class name which really doesn't mean much to non-JVM languages in favor of the more brief and less scary simple name - I thought we had decided that a smaller byte size for the network outweighed the downside of being less specific about the type. no?
> I would also like to point out that this format [
{"@class":"java.lang.Integer"}, 1] can be a pain in systems that do not necessarily order lists.
I seem to remember that we discussed that before but don't remember the outcome - @newkek do you remember what was said?
Github user newkek commented on the issue:
> I thought that java integer would be implied.
Yes with the current PR serializing an Integer will result in no type added. Serializing a Long or a Short will result in an explicit type added. To be very explicit, in JSON everything is Doubles and Longs, it does that because it's the bigger container format that contains the others less precise ones. Jackson however, considers that when an Integer is to be serialized, there is no need for an explicit type because the precision will be kept since for JSON it's in a Long. When it comes to serializing a Long, still no precision loss but the format explicitly indicates to the Deserializer that what has been serialized initially was a Long. So everything is as defined as if the Integer was typed, because by default the reader assumes everything's an Integer, and if not there will be a type to specify what it is. However we have save a large payload size by not typing the integer. Same concept for Float VS Double, for JSON everything is a Double, if a value is a Float, it will be typed, if not, by default it's a Float.
So as I said in the description I think the outcome of the mail conversation was to not type Simple values. Mostly because we're not sure if this is going to be useful or not. However, adding those types in the future, for Simple values, can be very easily done and I've left detailed comments in the `TypeSerializer` on how to add them when we deem it necessary. Also, adding them would not break existing code since the format would be the same, it's just that every value would follow the format for typed values. Since there's no possibility to mix-up for the Numeric values as explained above with the Shorts and Longs and etc... I still think we should wait somebody explicitly requires it.
> Perhaps
{"@class":"java.lang.Integer", "value": 1}is a better option.
As I see it for how I implemented the TypeDeserializer, it acts as a meta deserializer that will read the raw text JSON sequentially, so there's no chance there can be a mixup in the order, it does not deserialize the whole structure and creates a `List<Map<>>` object. Same for the TypeSerializer, it does not create a Java `List` object to write the type, but it writes directly in JSON "write a start_array token, write a start_map token, write a property name, write a text field (the type name), write a end_array token, etc" so same thing, there is no chance there can be a mix up in the order.
I don't know what it could look like for parsers of other languages, but it would seem like doing something else than that would be quite inefficient in terms of performance because it would that for every typed value you would instantiate a `new List<Map<>>` just to read a simple value. Does that makes sense ?
Github user spmallette commented on the issue:
If i understand right: Serializing integer without a type tag would result in integer. You would have to specify "Long" as the type for it to be interpreted as a Long in java. This is generally just a problem for numerics so the conversion is:
- number with no decimal and no type = Java integer
- number with decimal and no type = Java Float
- all other numerics will have a java type present (simple name)
@newkek is that an appropriate description of what's happening in conversion?
Github user newkek commented on the issue:
@spmallette yes, maybe some code will help explain.
ObjectMapper mapper =;
Map<String, Integer> map = new HashMap<>();
map.put("helo", 2);
String s = mapper.writeValueAsString(map);
// prints 's =
System.out.println("s = " + s);
Map read = mapper.readValue(s, Map.class);
// prints 'read.get("helo") = class java.lang.Integer'
System.out.println("read.get(\"helo\") = " + read.get("helo").getClass());
Map<String, Long> map3 = new HashMap<>();
map3.put("helo", 2L);
String s2 = mapper.writeValueAsString(map3);
// prints 's2 = {"helo":[
System.out.println("s2 = " + s2);
Map read2 = mapper.readValue(s2, Map.class);
// prints 'read2.get("helo") = class java.lang.Long'
System.out.println("read2.get(\"helo\") = " + read2.get("helo").getClass());
> number with decimal and no type = Java Float
Almost, by default decimals are Double, not Floats.
Github user spmallette commented on the issue:
imo, I think that if we document GraphSON this way, there should not be too much confusion as to how the type system works.
Github user dkuppitz commented on the issue:
I hope you guys figured it all out yesterday.I didn't follow all the comments, but started a `docker/ -t -i -n` over-night job. It succeeded, thus
VOTE: +1
Github user robertdale commented on the issue:
Some suggestions:
1. A type is not a class. Call types 'type' or '@type' (not clear the @ is necessary since it's in the metadata payload anyway); Not @class. (Otherwise be consistent and rename all the Type* classes to be Class*. e.g. ClassDeserializer)
2. Don't use Java types. Use BSON as a reference. It has a nice type system and solved most if not all of the type concerns.
3. If space and processing efficiency is a priority, then consider actually using BSON.
4. Alternatively, use an external schema to define types. It could even be appended to and a part of the output.
Github user spmallette commented on the issue:
> 1. A type is not a class. Call types 'type' or '@type' (not clear the @ is necessary since it's in the metadata payload anyway); Not @class. (Otherwise be consistent and rename all the Type* classes to be Class*. e.g. ClassDeserializer)
Changing "@class" to "@type" is ok by me if others like it. I'm not tied to one or the other - "@type" does seem a little better to me the more i think on it, but I'll let @newkek (or others) weigh in on it.
> 2. Don't use Java types. Use BSON as a reference. It has a nice type system and solved most if not all of the type concerns.
Not sure about BSON typing as a solution. Ultimately we want to know if something is a `Vertex`, `Edge`, `Duration`, `GeoPoint`, etc. In fact we don't always know the types ahead of time (like Titan's `GeoPoint`), so using the java class name is pretty convenient.
3. If space and processing efficiency is a priority, then consider actually using BSON.
imo, we're pretty deep into this approach having discussed it over multiple weeks in the community. making a big switch like that is probably something to reserve for the future especially since @newkek put a fair bit of effort into this work at this point and it delivers what took a while to get agreement on.
If however BSON (or some other format) could be proven a more efficient network serialization format that is truly programming language agnostic, with wide support and consistently performant parsers in the major languages we support (which is what had doomed MsgPack some time back), then I think we could consider that as an additional IO format. @robertdale if you have ideas there, it would be nice to hear them. Please consider sending a message on the dev mailing list if you do.
4. Alternatively, use an external schema to define types. It could even be appended to and a part of the output.
That would be an interesting option, however would mostly be good for network serialization and not so much for file storage. So far we haven't written a network only IO package, though we have written a file storage only one with GraphML. I think that we could consider a network serialization one only since dependence on Gremlin Server for non-JVM languages is going to be something we need to support in the face of GLVs.
Thanks for your thoughts @robertdale
Github user newkek commented on the issue:
Yes the `@class` bothered me a little as well, since now it doesn't really describe a "Class". But I did not want to change, for consistency for GraphSON v1.0 and I reused the `GraphSONTokens.CLASS`. So if it is ok, I'd be +1 for `@type` actually.
Concerning BSON (or MsgPack) it probably is indeed a more efficient serialization solution, however the goal of this PR is to optimize and improve JSON. Implementing Graph-BSON is probably another topic adjacent to this one, on which btw I'd be happy to collaborate as well.
Github user spmallette commented on the issue:
Just for reference:
Github user robertdale commented on the issue:
@newkek @spmallette Sorry, my context was only this thread. I agree with you on all accounts.
Github user newkek commented on the issue:
Just pushed the change for `@class` to `@type`. All tests pass.
'twas a 1 line fix, isn't that wonderful.
Github user PommeVerte commented on the issue:
Ok I thought The main point here was robustness in typing for non-java languages. Hence why I suggested even typing things like Int and using java classes. Honestly depending on how you compiled your PHP your basic JSON int will be converted to `Long`. And that's the case I was highlighting.
It makes sense to ignore this if you're trying to reduce de payload but we'll still be lacking on the non-lossiness end.
>As I see it for how I implemented the TypeDeserializer, it acts as a meta deserializer that will read the raw text JSON sequentially, so there's no chance there can be a mixup in the order
You can't guaranty that the order would be maintained in some languages hence my previous comment. These languages will parse the JSON into `List<Map, ?>` then check the list and Map along the lines of what I said in my previous post. It would be much more efficient to simply have JSON cast to `Map` and run a simple check on keys.
Github user newkek commented on the issue:
Well, I agree we cannot prevent some parsers and drivers to read the whole JSON content and check the created object only afterward. I would not recommend it though, because it is not great for memory consumption, but it is maybe something we have the opportunity to avoid only thanks to Jackson and some libraries may not offer that capability. So I guess I'll switch to the `
{"@type":"...", "value": ...}` format instead of the current one. Does that sound good @spmallette @PommeVerte ?
Github user spmallette commented on the issue:
This change is largely to help the consumption of GraphSON by non-JVM languages. Since I'm largely familiar with the behaviors of java based parsers and such, i'm not a good judge of that so i have to rely on @PommeVerte and others in this area. If the switch to Map is helpful then I think we should go that route. @newkek I would not be hasty in the change though. Perhaps we give it a day or so to think about to be sure that everyone is happy with that approach before you make the change and commit to it.
Github user robertdale commented on the issue:
So I've caught up on the discussion and I'll offer some more food for thought since I haven't seen any other ideas. Embedding metadata is neither easy nor fun (not for me anyway). For any serious integration type work it's always best to have a well-defined schema up-front.
On types:
> @spmallette
> In fact we don't always know the types ahead of time (like Titan's GeoPoint), so using the java class name is pretty convenient
Convenience is not the same as using Java types. By "not using java types", we mean:
- not using java package names
- not using types specific to Java
- using primitives and other common types that are concise and portable
- should include domain-specific types. e.g. Vertex, Edge, etc.
- may include other standards. e.g. GeoJSON
Defining primitives, common types:
So if your Java implementation conveniently shares the same name as the type, then that's wonderful. But if you are to be truly language-agnostic, then at some point the types must be known ahead of time in order to be consumed. For instance, how can my X parser know how to handle a Titan GeoPoint if it's all dynamic? It can't. It must be able to handle this type ahead of time. And I can't imagine someone would want to manually read a graphson file to discover all the types that must be handled. Maybe I'm getting out of scope as this goes beyond language and steps into being database agnostic. @newkek, please correct me if I'm wrong, but it doesn't look like the code does any dynamic serializing. It looks like all types are registered anyway. So I'll argue again if you know your types ahead of time, then you may as well have a schema.
But let's continue with embedded metadata...
In JSON, the only unambiguous types are
- array (unless you want to disambiguate from list which may be very valid)
- string
- boolean (true, false)
- null
To avoid confusion on all other types, including numbers, they should be typed. Thus they are objects (and not lists of things). The metadata can be at the same level as the object and alleviates these concerns: @newkek " a List in which the first element is a Map in which the first entry's key" and @PommeVerte "can be a pain in systems that do not necessarily order lists". Metadata can be differentiated from member fields by a prefix (e.g. '@'). Primitive types (or objects) having only a single value would have a "value" key which maps to the actual value.
{ "id":
{ "id":
{ "@type":"Edge",
"properties":{ },
"outV":[ { } ]
I wouldn't concern myself with the additional payload size for metadata. I wouldn't sacrifice conciseness for size. One could always compress the file if size is a concern. Also, the reader/writer could be easily enhanced to support zip. I would take the pragmatic approach and address it when it's no longer working for people.
Anyway, maybe this is all GraphSON 3.0 stuffs. HTH.
Github user newkek commented on the issue:
@robertdale the format you suggest would lead to the same inconsistencies as in GraphSON 1.0. Since the type is at the same level than the data itself, whether the container is an Array or an Object, the type format would not be the same. I just pushed a change in the format that is the one @PommeVerte suggested, which gives a consistent format, without the concern of unordered Lists (for reference the new format is `
{"@type" : "typeName", "@value" : value}`.
> please correct me if I'm wrong, but it doesn't look like the code does any dynamic serializing.
The `TypeIdResolver` which is the object that the serializers will call to get a TypeID from a java `Object` is dynamic in a way, in the sense that it returns `o.getClass().getSimpleName()`. So there is no `object` -> typeID index reference. However for the Deser, as explained in the description, Java by default doesn't offer a way to get a Class by its simple name, so the `TypeIdResolver` needs to keep a reference index of typeID(which is a class's simple name) -> Java `Class`. Don't know if that answers your question..
Github user spmallette commented on the issue:
I think we should try to merge this PR at this point (though we need one more +1 - @PommeVerte do you have a moment to do a final review?). There are a few little tweaks here and there, but I can quickly do those after we merge.
I configured Gremlin Server to use 2.0 and it works nicely with REST:
$ curl "http://localhost:8182?gremlin=g.V(1)"
I expect to do more tests around the server this week.
All tests pass with `docker/ -t -n -i`
Github user spmallette commented on the issue:
Note that I've started a new thread on the dev mailing list related to a new IO format with GLVs and Gremlin Server i mind:
If any of you have thoughts on the matter, please feel free to join the discussion there.
Github user PommeVerte commented on the issue:
ok I caught up and checked code. I think what @robertdale suggests on the typing end would be really nice (`int64`, `int32`, etc). But we can keep that for the thread stephen linked to.
I VOTE +1 on this. @spmallette if you could just add a mention in the documentation that the basic types are based off of the JVM types that would be a nice bonus to have.
Nice work @newkek, some nice effort went into this.
Github user okram commented on the issue:
Adding my email on dev@ to here.
I’m not following this PR too closely so what I might be saying is a already known/argued against/etc.
1. I think we should go with @robertdale proposal of int32, int64, Vertex, uuid, etc. instead of Java class names.
2. In Java we then have a `Map<String,Class>` for typecasting accordingly.
3. This would make GraphSON 2.0 perfect for Bytecode serialization in TINKERPOP-1278.
4. I think that if a Vertex, Edge, etc. doesn’t have properties, outV, etc. then don’t even have those fields in the representation.
5. Most of the serialization back and forth will be `ReferenceXXX` elements and thus, don’t create more Maps/lists for no reason. — less chars.
For me, my interests with this work is all about a language agnostic way of sending Gremlin traversal bytecode between different languages. This work is exactly what I am looking for.
Github user spmallette commented on the issue:
While we have 3 +1s, I don't want to make a GraphSON 2.0 that immediately needs to be revised to a 3.0. I don't see clear consensus on what we do with types/schema and I don't think we should hold up release any further, so I think GraphSON 2.0 should be delayed for a future version when we can get everything clear.
I think the core of the work done here so far however is good. I've merged this PR to here:
for more collaboration and discussion.
GitHub user newkek opened a pull request:
TINKERPOP-1274: GraphSON 2.0 [revised]
For context, a precise description is provided in the PR for the first version of the fix: #351. Please see this first for initial context.
This PR provides initial set of features defined in #351, plus the following:
- Types for Graph objects.
- Types for *all* numeric values.
- New type IDs format.
- Avoid serializing empty properties field.
As a reminder the format for types is the following:
- A value not typed : `value`
- A value typed : `
{"@type":"typeName", "@value":value}
- New type IDs format
A type ID is now composed of 2 parts, the "domain" and the type name. A "domain" can be used by any implementor to implement their own data type, avoiding collisions with the existing TinkerPop type IDs. The default domain for Graph object is "gremlin".
- New type IDs format
- Types for all numeric values
A type is now present for every numeric value, the types have been renamed to be more understandable with regards to their memory sizes or kinds. As a reference, here is a description of all currently existing types and their corresponding Java implementation:
- Types for all numeric values
- Java `Integer`: "gremlin:int32"
- Java `Long`: "gremlin:int64"
- Java `Short`: "gremlin:int16"
- Java `Float`: "gremlin:float"
- Java `Double`: "gremlin:double"
- Types for all Graph objects.
New typeIDs introduced in #351 (time types, UUIDs, etc..) now follow the type format defined here: "domain:typename".
Types have now been included for Graph-specific objects, here is an exhaustive list of the existing types handled so far and their corresponding IDs:
- Types for all Graph objects.
- `Vertex` -> "gremlin:vertex"
- `Edge` -> "gremlin:edge"
- `VertexPropery` -> "gremlin:vertexproperty"
- `Property` -> "gremlin:property"
- `Path` -> "gremlin:path"
- `Tree` -> "gremlin:tree"
- `Graph` -> "gremlin:graph"
- `Metrics` -> "gremlin:metrics"
- `TraversalMetrics` -> `gremlin:traversalmetrics"
This improvement defines a requirement to the serialization format which is that every type must have a Jackson serializer and deserializer defined on the source `ObjectMapper`. Previous not defined serializers and deserializers have been added in this PR.
Code-wise, it's pretty much the same than for #351, the big intake in code here is the addition of the deserializers for all Graph objects, a big simplification to the serializers (`GraphSONSerializersV2d0`), the addition of the "domain" to the type system, and making that new typeID format configurable to users through the `TinkerPopJacksonModule`.
`mvn clean install` test suite passes, and it's rebased on top of current `master`.
You can merge this pull request into a Git repository by running:
$ git pull TINKERPOP-1274-rev
Alternatively you can review and apply these changes as the patch at:
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #386
commit 635be59cf6505a47a1ca1d27519d07e918fdc1ef
Author: Stephen Mallette <>
Date: 2016-05-18T12:41:26Z
TINKERPOP-1274: GraphSON 2.0.
Github user newkek commented on the issue:
Closing this in favor of #386.
Github user okram commented on the issue:
First off. Thank you for doing this. Here are some notes.
1. The term is "namespace" not "domain." In case that terminology is used in the documentation and not just in the PR notes.
2. Why do we have `@class` and `@type`. Seems we should just pick one. And if the answer is cause `@class` is for Java object mapping infrastructure, that isn't good and we will also use GraphSON to create `Vertex` objects in Python and it might use UUID for its ID as well. Thus, `@class=UUID` is perhaps `@type=UUID`. That is, if an object is "typed" its `@type` and the resolver (for that language) will have to map the correct class.
3. I read your notes, but its still not clear to me. Why do we have `@value`? I think we only need `@type` (or `@class`) and then the rest of the JSON object data is up to the type serializer/deserializer. For instance, in `Bytecode` (TINKEROP-1278) we have `@type=Lambda` and then `value`, `language`, `arguments`. That is, there are 3 "values" we need to make the object and thus `@value` shouldn't be special with a `@`. Plus, we save a char.
4. I think we can save some bytes and still be readable if we make our namespace `g` instead of `gremlin`. Moreover, we should be consistent in our naming convention and either use some standard JSON naming or default to Java naming – e.g. `g:vertexProperty`. Or better yet, if its NOT namespaced, its assumed to be "Gremlin". That is, TinkerPop owns `vertex`, `vertexProperty`, `edge`. You want to do something else – its `myapp:vertex`. This also means we own `uuid`, `int64`, etc. etc.
Thanks again for the extensive work.
Github user newkek commented on the issue:
Hey @okram ,
> 1. The term is "namespace" not "domain." In case that terminology is used in the documentation and not just in the PR notes.
Ok, it's some places here and there in the code, I'll correct it.
> 2. Why do we have `@class` and `@type`.
`@class` shouldn't be anywhere near GraphSON 2.0 anymore, everything is `@type`, if you've seen it somewhere, that should be corrected. While writing that I realise I might have forgotten to update the "the-graph.asciidoc" doc... so I'll correct it. But in the actual code it shouldn't be there anymore.
> 3. I read your notes, but its still not clear to me. Why do we have `@value`? I think we [...]
So, taking a step back and I'm going to use the exampleof ByteCode. As you can see in the `GraphSONSerializersV2d0`, serializers don't have to care anymore about types, serializer are put into the context of "You're in a place where you can write whatever you want, type is handled somewhere else for you". For the ByteCode serializer it means it would open a Map, put a field "language" and its value, put a field "argument" and its value, put a field "value" (or call it "bytecode") and its value, close the map. When deserializing, read the map, get the fields values, create the ByteCode object.
Let's admit for whatever reason the ByteCode serializer in the future needs to change and instead of writing a Map, it needs to start by an Array where the first element of the array is 'something', the rest is the Map described before, and close the array. Doing this is currently possible with the current format `
`, because the serializer is in a place where it can write whatever it wants to.
Now let's consider another format: `
`. There, the serializer is put in the context "You are currently in a *Map, whatever you write, it must be in the form of a key/value pair". The "potential future" evolution of the ByteCode serializer (writing an array) *wouldn't be possible. For the sake of consistency, I think it is important to keep the format as is
> 4. I think we can save some bytes and still be readable if we make our namespace g
Totally fine with `g`, again for the sake of consistency, I think it'd be valuable to have a 'name space' all the time.
Github user newkek commented on the issue:
NB: the conflicts for merge are caused by the CHANGELOG
Github user newkek commented on the issue:
A remark: @okram requested `TraversalExplanation` to be included in the list of Graph objects. I've noticed that the current `TraversalExplanation` Graphson serializer serializes a `Traversal`. In order to follow the requirement of this GraphSON2.0 type system, we would have to write a deserializer for the `TraversalExplanation` (there isn't currently). However, the current 1.0 serializer serializes the Traversal as the `toString()` of the Traversal's step list, which means that it is not possible to deserialize to a Traversal with only such info. So I believe it would be more appropriate to wait for TP1278 before implementing the `TraversalExplanation` ser/de and leave it out of the scope of this PR.
Github user spmallette commented on the issue:
I think I agree with @newkek to push that off until we can get his massive PR reviewed/merged. Let's see if we can solve that separately.
Github user PommeVerte commented on the issue:
I'm liking how this has turned out.
I personally don't have much of an opinion in regards to assuming gremlin is the default namespace and getting rid of `g:` though it does have the merit of being self documenting. I don't know how much of a size difference this would represent.
Github user spmallette commented on the issue:
is that an official +1 from you @PommeVerte or are you still reviewing ?
Github user PommeVerte commented on the issue:
Still reviewing I would like to finish going through the code will try and do this later in the day. But looking good so far
Github user newkek commented on the issue:
I just pushed corrected docs, and renamed everything "domain" to "namespace". Waiting for more consensus to change the default gremlin namespace.
Github user spmallette commented on the issue:
All tests pass with `docker/ -t -n -i` - still going through code though.
Github user PommeVerte commented on the issue:
I'm waiting to hear back from marko on the points he brought up to cast my vote
Github user okram commented on the issue:
@newkek I understand. I like the `@value` concept – its the JSON sub-object for creating the object (defined by `@type`).
VOTE +1.
Github user PommeVerte commented on the issue:
Ok cool if we're all ok with the new features then I'm satisfied. Code looked good as well VOTE: +1
Github user newkek commented on a diff in the pull request:
— Diff: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/util/ —
@@ -54,6 +54,14 @@
public DefaultTraversalMetrics() {
+ // This is only a convenient constructor needed for GraphSON deserialization.
+ // TODO: see if that's ok to add that.
+ public DefaultTraversalMetrics(long totalStepDurationNs, List<MutableMetrics> metricsMap) {
— End diff –
Not sure who I should ask about that, I decided to add a public constructor here for convenience during the deserialization, it takes the total duration, and a list of all the Metrics its composed of.
Github user newkek commented on a diff in the pull request:
— Diff: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/util/ —
@@ -42,6 +42,18 @@ public MutableMetrics(final String id, final String name)
+ // create a MutableMetrics from an Immutable one.
+ // needed that for tests, don't know if it is worth keeping it public.
+ // TODO: see if it's ok to add this
+ public MutableMetrics(Metrics other) {
— End diff –
Same here I added a convenience constructor that takes any kind of Metrics and returns a `MutableMetrics`. It was convenient for tests, as I wanted to serialize Metrics that I am sure have nested Metrics, and make sure the ser/de works for those.
Github user spmallette commented on a diff in the pull request:
— Diff: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/util/ —
@@ -54,6 +54,14 @@
public DefaultTraversalMetrics() {
+ // This is only a convenient constructor needed for GraphSON deserialization.
+ // TODO: see if that's ok to add that.
+ public DefaultTraversalMetrics(long totalStepDurationNs, List<MutableMetrics> metricsMap) {
— End diff –
I noticed that - I'll be making some tweaks once we get it over to master. I'll take a look at those odds and ends.
Renamed to GraphSON 2.0 - i sense it can still be done without breaking any code we already had with full support for 1.0, but calling it 1.1 didn't seem like a big enough departure for the changes entailed.