Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
3.1.1-incubating
Description
As discussed in this topic: https://groups.google.com/forum/#!topic/gremlin-users/Kz2bOeAeqh4
It will ensure the same behavior between g.V().hasId(id) vs. g.V(id) and graph.vertices(id)
Attachments
Issue Links
- requires
-
TINKERPOP-1048 Vertex lookups by id are inconsistent
- Open
Activity
Github user okram commented on the pull request:
https://github.com/apache/incubator-tinkerpop/pull/267#issuecomment-197391213
@spmallette – please review this carefully as it deals with ID handling and I know that is a rats nest that you maintain.
Github user okram commented on the pull request:
https://github.com/apache/incubator-tinkerpop/pull/267#issuecomment-197409086
```
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Apache TinkerPop .................................. SUCCESS [5.475s]
[INFO] Apache TinkerPop :: Gremlin Shaded ................ SUCCESS [2.730s]
[INFO] Apache TinkerPop :: Gremlin Core .................. SUCCESS [45.587s]
[INFO] Apache TinkerPop :: Gremlin Test .................. SUCCESS [13.455s]
[INFO] Apache TinkerPop :: Gremlin Groovy ................ SUCCESS [43.580s]
[INFO] Apache TinkerPop :: Gremlin Groovy Test ........... SUCCESS [6.946s]
[INFO] Apache TinkerPop :: TinkerGraph Gremlin ........... SUCCESS [2:08.320s]
[INFO] Apache TinkerPop :: Hadoop Gremlin ................ SUCCESS [6:23.360s]
[INFO] Apache TinkerPop :: Spark Gremlin ................. SUCCESS [5:40.072s]
[INFO] Apache TinkerPop :: Giraph Gremlin ................ SUCCESS [9.542s]
[INFO] Apache TinkerPop :: Neo4j Gremlin ................. SUCCESS [20:00.173s]
[INFO] Apache TinkerPop :: Gremlin Driver ................ SUCCESS [9.509s]
[INFO] Apache TinkerPop :: Gremlin Server ................ SUCCESS [6.850s]
[INFO] Apache TinkerPop :: Gremlin Console ............... SUCCESS [16.411s]
[INFO] Apache TinkerPop :: Gremlin Archetype ............. SUCCESS [0.097s]
[INFO] Apache TinkerPop :: Archetype - TinkerGraph ....... SUCCESS [4.735s]
[INFO] Apache TinkerPop :: Archetype - Server ............ SUCCESS [10.764s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 37:08.169s
[INFO] Finished at: Wed Mar 16 10:17:59 MDT 2016
[INFO] Final Memory: 109M/759M
```
VOTE +1.
Github user dkuppitz commented on the pull request:
https://github.com/apache/incubator-tinkerpop/pull/267#issuecomment-197458472
What about the behavior between different id types?
- `g.V(1)` should be the same as `g.V().hasId(1)`
- `g.V("1")` should be the same as `g.V().hasId("1")`
Github user okram commented on the pull request:
https://github.com/apache/incubator-tinkerpop/pull/267#issuecomment-197471337
I believe that is all specific to `TinkerGraph` (or the vendor specifically). In short, I don't know. @spmallette ?
Why, are you getting results that don't make sense?
Github user mikedias commented on a diff in the pull request:
https://github.com/apache/incubator-tinkerpop/pull/267#discussion_r56389186
— Diff: gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/HasTest.java —
@@ -354,6 +365,33 @@ public void g_V_hasXlocationX()
+ @Test
+ @LoadGraphWith(MODERN)
+ @IgnoreEngine(TraversalEngine.Type.COMPUTER) // only validate for OLTP
+ public void g_V_hasId_compilationEquality() {
+ final Traversal<Vertex, Vertex> traversala1 = get_g_VX1X(convertToVertexId("marko"));
+ final Traversal<Vertex, Vertex> traversala2 = get_g_V_hasIdX1X(convertToVertexId("marko"));
+ final Traversal<Vertex, Vertex> traversalb1 = get_g_VX1_2X(convertToVertexId("marko"), convertToVertexId("vadas"));
+ final Traversal<Vertex, Vertex> traversalb2 = get_g_V_hasIdX1_2X(convertToVertexId("marko"), convertToVertexId("vadas"));
+ printTraversalForm(traversala1);
+ printTraversalForm(traversala2);
+ printTraversalForm(traversalb1);
+ printTraversalForm(traversala2);
— End diff –
small typo: traversala2 -> traversalb2
Github user dkuppitz commented on the pull request:
https://github.com/apache/incubator-tinkerpop/pull/267#issuecomment-197511828
That's what bothers me:
```groovy
gremlin> g.V(1).values("name")
==>marko
gremlin> g.V().hasId(1).values("name")
==>marko
gremlin> g.V("1").values("name")
gremlin> g.V().hasId("1").values("name")
==>marko
```
Github user okram commented on the pull request:
https://github.com/apache/incubator-tinkerpop/pull/267#issuecomment-197514987
@dkuppitz – yea, @spmallette should review this. Why `HasContainer` does casting, I don't know. I think its bad...
Github user spmallette commented on the pull request:
https://github.com/apache/incubator-tinkerpop/pull/267#issuecomment-197536538
I won't deny confusion, but I thought that you only get weird stuff with TinkerGraph when you use it in the default mode where types matter for the ids. When you define an id manager, it returns expected results:
```text
gremlin> conf = new BaseConfiguration()
==>org.apache.commons.configuration.BaseConfiguration@6273c5a4
gremlin> conf.setProperty('gremlin.graph','org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph')
==>null
gremlin> conf.setProperty('gremlin.tinkergraph.vertexIdManager','LONG')
==>null
gremlin> graph = TinkerGraph.open(conf)
==>tinkergraph[vertices:0 edges:0]
gremlin> TinkerFactory.generateModern(graph)
==>null
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V(1).values("name")
==>marko
gremlin> g.V().hasId(1).values("name")
==>marko
gremlin> g.V("1").values("name")
==>marko
gremlin> g.V().hasId("1").values("name")
==>marko
```
I'm pretty sure that other graphs don't suffer this problem either - it's just the default type system of TinkerGraph that forces the expected type of the id to match..
Github user spmallette commented on the pull request:
https://github.com/apache/incubator-tinkerpop/pull/267#issuecomment-197538277
it builds for me locally btw - i guess travis is in the minority and can be ignored. why do we even bother with CI this way - almost useless.
Github user okram commented on the pull request:
https://github.com/apache/incubator-tinkerpop/pull/267#issuecomment-197591991
I think @spmallette needs to make a call on whether what I have done is still within convention and whether this PR is mergable.
Github user spmallette commented on the pull request:
https://github.com/apache/incubator-tinkerpop/pull/267#issuecomment-197826594
I'm fine with the test cases and what they enforce - VOTE +1
Github user okram commented on the pull request:
https://github.com/apache/incubator-tinkerpop/pull/267#issuecomment-197865254
@spmallette – the problem is that this is a breaking change in that `HasContainers` do id magic. Why do they do that? Seems like a really really bad idea to have something so "high level" do id manipulations. If you are aware of this and have thought it all through and still thing this PR is good, okay.
Github user spmallette commented on the pull request:
https://github.com/apache/incubator-tinkerpop/pull/267#issuecomment-197869261
oops - I was voting +1 without the understanding that this constituted a breaking change. I figured the id magic issue might be sorted out separately, but if this somehow breaks that, then I guess +1 was the wrong answer.
so - with that, i still don't fully understand where the break is. All the old tests still pass don't they? Am I missing where you modified/removed the tests that check for appropriate id semantics?
Perhaps the id magic is bad - it's already not perfect as you saw from Daniel above - this issue details some of the problems with getting TinkerGraph completely compliant:
https://issues.apache.org/jira/browse/TINKERPOP-1048
Maybe there's a better way to get the id semantics tests to pass than what i'm doing in the `HasContainer`. I just don't know what it is offhand.
I just marked this ticket as being BLOCKED by TINKERPOP-1219. I think we need to solve that first before we touch this ticket.
Github user dkuppitz commented on the pull request:
https://github.com/apache/incubator-tinkerpop/pull/267#issuecomment-197886964
> so - with that, i still don't fully understand where the break is.
My sample queries now look like this:
```
gremlin> g.V(1).values("name")
==>marko
gremlin> g.V().hasId(1).values("name")
==>marko
gremlin> g.V("1").values("name")
gremlin> g.V().hasId("1").values("name")
gremlin>
```
Github user spmallette commented on the pull request:
https://github.com/apache/incubator-tinkerpop/pull/267#issuecomment-197925611
what's "g" in your examples - TinkerGraph? if so, when did those queries not look like that? that's how they are in master. That's the point of TINKERPOP-1048. note that my example above works when configured with the appropriate id manager, both on master and in this branch. that's why i say the behavior is unchanged to me...am i still missing the point?
Github user dkuppitz commented on the pull request:
https://github.com/apache/incubator-tinkerpop/pull/267#issuecomment-197931504
`g` is `TinkerFactory.createModern().traversal()` using this PR. It looks good, since it is consistent, but it's different from what we've had in 3.1.x. I did not look into `master/`, but if you say that this is already the default behavior for 3.2.x, then cool...
VOTE: +1
Github user spmallette commented on the pull request:
https://github.com/apache/incubator-tinkerpop/pull/267#issuecomment-197932809
the behavior is the same on tp31 branch afaik just tested 3.1.0 - it's been like this for a long time
Github user spmallette commented on the pull request:
https://github.com/apache/incubator-tinkerpop/pull/267#issuecomment-197966824
After more discussions with @dkuppitz on the side, he showed me where the break was occurring. The difference is in:
```text
// This is TP31:
gremlin> g.V(1).values("name")
==>marko
gremlin> g.V().hasId(1).values("name")
==>marko
gremlin> g.V("1").values("name")
gremlin> g.V().hasId("1").values("name")
==>marko
// This is the PR:
gremlin> g.V(1).values("name")
==>marko
gremlin> g.V().hasId(1).values("name")
==>marko
gremlin> g.V("1").values("name")
gremlin> g.V().hasId("1").values("name")
gremlin>
````
He prefers this behavior in this PR, which is fine and should be the break documented explicitly for TinkerGraph in upgrade docs, with the note that configuration of an id manager allows things to work perfectly as a I showed above.
I also took a momen to test Neo4j directly - this confirmed it's just a TinkerGraph issue:
```text
gremlin> graph = Neo4jGraph.open('/tmp/neo4j-test')
==>neo4jgraph[Community [/tmp/neo4j-test]]
gremlin> graph.addVertex("name","stephen")
==>v[0]
gremlin> graph.addVertex("name","daniel")
==>v[1]
gremlin> g = graph.traversal()
==>graphtraversalsource[neo4jgraph[Community [/tmp/neo4j-test]], standard]
gremlin> g.V(1).values("name")
==>daniel
gremlin> g.V().hasId(1).values("name")
==>daniel
gremlin> g.V("1").values("name")
==>daniel
gremlin> g.V().hasId("1").values("name")
==>daniel
```
I would still love to solve TINKERPOP-1048 and get it all completely settled, but I think what we have here is ok especially since the existing tests didn't break or change. with all that in mind:
VOTE +1
Github user okram commented on the pull request:
https://github.com/apache/incubator-tinkerpop/pull/267#issuecomment-197971920
@spmallette – Can you give me the upgrade comment blurb to add to the upgrade docs? I will add it on merge.
GitHub user okram opened a pull request:
https://github.com/apache/incubator-tinkerpop/pull/267
TINKERPOP-1219: Create a test case that ensures the provider's compilation of g.V and g.V().hasId is identicalhttps://issues.apache.org/jira/browse/TINKERPOP-1219
OLTP providers that have custom `GraphStep` implementations should have `g.V(1)` and `g.V().hasId(1)` compile to the same representation. Likewise for `g.V(1,2)` and `g.V().hasId(1,2)` and `g.V().has(T.id,within(1,2))`. This ensures that random-access databases are using not using linear scans with `g.V().hasId(…)`. I added `GraphStep.processHasContainerIds()` which makes it easy for providers to update their respective `XXXGraphStepStrategy` to `GraphStep.addIds()` is appropriate.
I found a few `hashCode()`-bugs in the process and fixed them up.
CHANGELOG
```
```
UPDATE
```
GraphStep Compilation Requirement
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OLTP graph providers that have a custom `GraphStep` implementation should ensure that `g.V().hasId` and `g.V` compile to the same representation. This ensures a consistent user experience around random access of elements based on ids (as opposed to potentially the former doing a linear scan). A static helper method called `GraphStep.processHasContainerIds()` has been added. `TinkerGraphStepStrategy` was updated as such:
```
((HasContainerHolder) currentStep).getHasContainers().forEach(tinkerGraphStep::addHasContainer);
```
is now
```
{ if (!GraphStep.processHasContainerIds(tinkerGraphStep, hasContainer)) tinkerGraphStep.addHasContainer(hasContainer); }((HasContainerHolder) currentStep).getHasContainers().forEach(hasContainer ->
);
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/apache/incubator-tinkerpop
TINKERPOP-1219Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-tinkerpop/pull/267.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #267
commit 53ecadd022adf01b94fdbcfbe669c51427ebac35
Author: Marko A. Rodriguez <okrammarko@gmail.com>
Date: 2016-03-16T15:11:22Z
g.V and g.V().hasId should compile to the same representation for OLTP graph databases to ensure a consistent user experience and to ensure that linear scans are not being performend on the latter. A test case was added to ensure this in HasTest. TinkerGraphStepStrategy and Neo4jGraphStepStrategy had to be updated accordingly as they were, in fact, a big wacky in their compilation. Added GraphStep.addIds() to make easy to increase the id pool. Found a few hashCode() issues while testing and have fixed them up.
commit 90e5ca3f3b0d006d42894316db3c0839bd92c583
Author: Marko A. Rodriguez <okrammarko@gmail.com>
Date: 2016-03-16T15:29:41Z
Neo4jGraphStep needed a valid hashCode().