Description
When StarVertex adds a self-loop, it adds an instance of concrete type StarOutEdge to both its outEdges map and its inEdges map.
(line 329 adds a StarOutEdge to the inEdges map)
Having a StarOutEdge in the inEdges map mostly doesn't matter. However, this does matter in the applyGraphFilter method. This method uses instanceof StarOutEdge to guess whether an edge's original direction was in or out:
If you trace applyGraphFilter through, you see that a StarVertex with a self-loop can pop out of applyGraphFilter with two out edges corresponding to the self loop and no in edges. This leads to weird traversal results. One way to trigger this is to write a GraphFilterAware InputRDD that produces StarVertex instances and then run a g.V()....inE("somelabel"), so that TP pushes down the "somelabel" constraint, which is how I got here.
I think addEdge should continue putting a StarOutEdge into outEdges, like it already does. However, it should put a StarInEdge into inEdges. This would increase the object overhead associated with StarVertices that have self loops (two edge objs instead of one). If we wanted to be obsessive about optimizing this case we could probably pervert the inEdge/outEdge datastructure contents to do it, but IMHO that's not worth it. Actually, I'm no longer convinced that's safe, since I think it would alter some of StarVertex's other semantics. For one, I think retrieving an edge and setting a property on it would probably break.
I'll make a PR soon.
I don't know all of the versions this affects, but I do know it affects 3.2.1.
I opened https://github.com/apache/tinkerpop/pull/372.
This solves my immediate problem. It also preserves the following semantics:
Maybe broken semantics that I missed will come up in review. Right now, I can only see the following risk:
I ran mvn -pl tinkergraph-gremlin clean verify, which I think indirectly includes StarGraphTest.