Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.6.5
-
None
-
None
Description
LocalStep is supposed to handle solutions locally, but what it actually does is unclear from the documentation.
What LocalStep actually does is,
- it just processes TraverserSet as it is (they are kept bulked, without being split).
- So when there are same elements in the previous Step's output and as long as they are bulked into a TraverserSet, it is processed in "object-local" manner.
- How TraverserSet is bulked ? It relies upon LazyBarrierStrategy which inserts noOpBarrierStep, that handles the bulking.
- Or we can explicitly add barrier() step to make the bulking happen
This creates some discrepancies that users may not easily see. As an illustration, this is the regular "object-local" behavior.
gremlin> g.V().in().out() ==>v[3] ==>v[3] ==>v[3] ==>v[2] ==>v[2] ==>v[2] ==>v[4] ==>v[4] ==>v[4] ==>v[5] ==>v[5] ==>v[3] ==>v[3] ==>v[3] gremlin> g.V().in().out().local(count()) ==>6 ==>3 ==>3 ==>2
You can see that the same objects (vertices) are processed locally. However, there is a case that it does not work in the way.
For example, you can disable the Strategy
gremlin> g.withoutStrategies(LazyBarrierStrategy.class).V().in().out().local(count()) ==>1 ==>1 ==>1 ==>1 ==>1 ==>1 ==>1 ==>1 ==>1 ==>1 ==>1 ==>1 ==>1 ==>1
then we are seeing "solution-local" behavior, each single solution processed locally. Likewise, there is a case that LazyBarrierStrategy does not kick in.
gremlin> g.V(1,1,1).local(count()) ==>1 ==>1 ==>1
It relies upon LazyBarrierStrategy but it would not be apparent to users. Furthermore, GraphProviders have freedom to drop any TinkerPop's strategies, so if LazyBarrierStrategy is dropped, local always works in solution-local manner.
There is a description in the doc that users may use map or flatMap. This can work, but many users may already be using local for "solution-local" without noticing. Also there are subtle differences among them.
(1) map only emits one solution per each incoming input, while working in solution-local
gremlin> g.V().map(out()).path() ==>[v[1],v[3]] ==>[v[4],v[5]] ==>[v[6],v[3]]
(2) flatMap }}can stream all solutions and solution-local, but only leaves the last element in Path unlike {{local
gremlin> g.V().flatMap(out().out()).path() ==>[v[1],v[5]] ==>[v[1],v[3]]
This flatMap's behavior is not documented but there are use-cases that users intentionally use flatMap for this feature.
So while in the documentation we recommend use these 2 instead of local, in some case it's not easy to migrate. At this point, I think
- We should clarify in the doc that
- What barrier() / noOpBarrierStep does and how it makes impact on local()
- How{{ LazyBarrierStrategy }}is related to barrier() / noOpBarrierStep
- what is different between map, flatMap and local, including Path handling
and instead of describing local() as internal use when implementing Strategy, we should tell users to use it whenever they understand how it works and what they are doing with local().