Description
As discussed on dev@...
Apache TinkerPop should provide, out-of-the-box, at least 3 Gremlin language variants. It would be cool if these were:
- Python (Mark Henderson)
- PHP (PommeVerte)
- Ruby (?okram)
I think each of these should be generated using the reflection-model presented in http://tinkerpop.apache.org/docs/3.2.1-SNAPSHOT/tutorials/gremlin-language-variants/. Moreover, on every mvn clean install, the code for these variants is generated.
Given the desire to separate language variants from language drivers, I think that a language driver for each variant above should be "plugable." Moreover, we should provide one driver implementation for each – simple GremlinServer REST.
gremlin-variants/
gremlin-ruby/
gremlin_ruby.rb
gremlin_ruby_rest_driver.rb
gremlin-php/
Gremlin_PHP.php
Gremlin_PHP_REST_Driver.php
gremlin-python/
gremlin-python.py
gremlin-python-rest-driver.py
Next, each variant implementation should be testable. This is PAINFUL if we have to implement each g_V_out_repeatXasXaXX test case in ProcessXXXSuite. Perhaps some RegEx transducer magic could be used to convert all those tests from Gremlin-Java to the respective host language? However, even if we do that, we still have the problem of how to test the returned results.
I think what we should test the returned results using the JVM. For instance, JRuby, Jython, JPHP (does it exist?). If we do this, we will save ourselves a massive headache. All we have to do is create a GraphProvider that uses TinkerGraph and whose TraversalSource is some sort of wrapper around reflection-generated Ruby (e.g.).
g.V.out_e("knows") // returns a Ruby iterator
That Ruby iterator should be converted to a Java iterator and then the ProcessXXXSuite can verify the results.
With this, most everything is reflectively constructed.
gremlin_ruby.rb // generated via Java reflection gremlin_ruby_rest_driver.rb // manually coded match_test.rb // generated via RegEx transducer has_test.rb // generated via RegEx transducer ... RubyGraphProvider.java // manually coded RubyProcessStandardSuite.java // manually coded RubyProcessComputerSuite.java // manually coded
Thus, the testing data flow would be:
MatchTest.Traversals.java --transducer-> match_test.rb match-test.rb --REST--> GremlinServer GremlinServer --GraphSON-->match-test.rb GraphSON --JRuby/GraphSONReader-->Java objects Java objects --JRuby-->MatchTest.java
Attachments
Issue Links
- is related to
-
TINKERPOP-1392 Remove support for java serialized Traversal
- Closed
- relates to
-
TINKERPOP-1328 Provide [gremlin-python] as an code executor in docs
- Closed
-
TINKERPOP-1347 RemoteConnection needs to provide TraversalSideEffects.
- Closed
-
TINKERPOP-1230 Serialising lambdas for RemoteGraph
- Closed
- links to
Activity
The problem of testing. I think I have an idea. Lets look at standard old Gremlin-Java SumTest.
public abstract class SumTest { // ... public static class Traversals extends SumTest { @Override public Traversal<Vertex, Double> get_g_V_valuesXageX_sum() { g.V().values("age").sum(); } // ... } }
We create an interface (or abstract class) called VariantConverter which does the following:
public class PythonVariantConverter extends VariantConverter { public String concat() { return "."; } public String source(final String source) { return source; } public String step(String stepName, final Object... arguments) { if(stepName.equals("values") && arguments.length == 1) return arguments[0].toString(); else if(stepName.equals("as") || stepName.equals("from") || ...) return "_" + stepName + "(" + Arrays.asStrings(arguments) + ")"; else return stepName + "(" + Arrays.asString(arguments) + ")"; } // probably more stuff.. like lambdas and P, Order, etc. stuffs. }
Then we will have a GraphTraversalSource and GraphTraversal implementation in Java that DOES NOT do all the addStep-stuff, but instead, uses the VariantConverter to generate a string representation of the traversal for that variant's host language. Thus, the Gremlin-Java test suite is used, but realize that the g.V().values("age").sum() call is creating a String in the host language, which is then evaluated to generate a Gremlin-Groovy string, which is then evaluated by Groovy to generate a Traversal. And that ultimate Traversal is then what is tested (i.e. returned from get_g_V_valuesXageX_sum()).
Thus, something like this...(and we only need to test these against TinkerGraph – both OLTP and OLAP).
public class TinkerGraphGremlinPythonProvider extends AbstractGraphProvider { private final VariantConverter converter = new PythonVariantConverter(); @Override public default GraphTraversalSource traversal(final Graph graph) { return new VariantGraphTraversalSource(graph, converter); } }
Again:
- VariantGraphTraversalSource and VariantGraphTraversal generate a Python string as instructed by PythonVariantConverter.
- The Python string is evaluated in Jython (or CPython with some System.eval()-magic) which generates a Gremlin-Groovy string.
- That Gremlin-Groovy string is evaluated by GremlinGroovyScriptEngine which generates a Traversal.
- That Traversal is returned by the respective XXXTest.g_V_outXknowsX_sum_xxxxxx()-test and thus, tested.
Thus, for us (TinkerPop) all we need to do is create:
VariantGraphTraversalSource // reusable for all variants VariantGraphTraversal // reusable for all variants Variant__ // reusable for all variants // probably can make this a Groovy class and just have it use meta-programming! easy peasy. -------------------------------- TinkerGraphGremlinPythonProvider PythonVariantConverter // manually coded but its not much gremlin-python.py // generated via reflection on each "mvn clean install" -------------------------------- TinkerGraphGremlinRubyProvider RubyVariantConverter // manually coded but its not much gremlin_ruby.rb // generated via reflection on each "mvn clean install" -------------------------------- TinkerGraphGremlinPHPProvider PHPVariantConverter // manually coded but its not much Gremlin_PHP.php // generated via reflection on each "mvn clean install"
Nice proposal. I think I understand where you are going. I would guess that this would work for the vast majority of languages. For any that don't fit that pattern, I guess we will just make a special case for. I would think the main trouble a language would run into is in this step:
> The Python string is evaluated in Jython (or CPython with some System.eval()-magic) which generates a Gremlin-Groovy string.
How often would we have to rely on the "magic" to make this pattern work. Not looking for an answer there - just thinking out loud.
I decided to test the idea. Check it out:
Assume this is PythonVariantConverter.
public class VariantConverter { public String source(final String source) { return source; } public String concat() { return "."; } public String step(final String stepName, final Object... args) { if (args.length == 0) return stepName + "()"; else return stepName + "(" + stringify(args) + ")"; } private static String stringify(final Object... args) { String temp = ""; for (final Object object : args) { if (object instanceof String) { temp = temp + "\"${object}\","; } } return temp.substring(0, temp.length() - 1); } }
class VariantTraversal implements GraphTraversal { private static GremlinGroovyScriptEngine scriptEngine = new GremlinGroovyScriptEngine(); private String variantString; private String traversalSourceName; private TraversalSource traversalSource; private VariantConverter converter = new VariantConverter(); private Traversal traversal; static { VariantTraversal.metaClass.invokeMethod = { String name, args -> if (name.equals("hasNext") || name.equals("next")) return delegate.metaClass.getMetaMethod(name, *args).invoke(delegate, args); variantString = variantString + converter.concat() + converter.step(name, args); return delegate; } } public VariantTraversal(final String traversalSourceName, final TraversalSource traversalSource) { this.traversalSourceName = traversalSourceName; this.traversalSource = traversalSource; this.variantString = traversalSourceName; } @Override public String toString() { return this.variantString; } @Override boolean hasNext() { if (null == this.traversal) this.compile(); return this.traversal.hasNext(); } @Override Object next() { if (null == this.traversal) this.compile(); return this.traversal.next(); } private void compile() { SimpleBindings bindings = new SimpleBindings(); bindings.put(this.traversalSourceName, this.traversalSource); this.traversal = (Traversal) scriptEngine.eval(this.variantString, bindings); } }
class VariantTraversalTest { @Test public void playTest() { final Graph graph = EmptyGraph.instance(); final Traversal traversal = new VariantTraversal("g", graph.traversal()).V().out("created").groupCount().by("name"); println traversal; println traversal.hasNext(); println traversal.next(); println traversal.hasNext(); } }
The output of the test above is (note its an EmptyGraph):
g.V().out("created").groupCount().by("name") true [:] false
There it is! ... All the language variant needs to do is implement their own VariantConverter... Groovy and introspection and ScriptEngine, etc. will take care of the rest.
I'm running into a bunch of meta-programming losing metaMethods stuff with TestSuite. I think we should do this:
1. Create a gremlin-variants package.
2. It will have VariantTraversalSource, VariantTraversal, VariantAnonymousTraversal, VariantConverter in main/.
3. It will have a version of all the tests, but only needed once for ALL variants.
public abstract class VariantSumTest { public static class Traversals extends SumTest { @Override public Traversal<Vertex, Double> get_g_V_valuesXageX_sum() { new VariantTraversalSource(converter,g).V().values("age").sum(); } @Override public Traversal<Vertex, Map<String, Number>> get_g_V_hasLabelXsoftwareX_group_byXnameX_byXbothE_weight_sumX() { new VariantTraversalSource(converter,g).V().hasLabel("software").group().by("name").by(bothE().values("weight").sum(); } } }
4. We will have TinkerGraphVariantProvider, TinkerGraphComputerVariantProvider ... in test/.
5. We will have XXXVariantProcessStandardSuite and XXXVariantProcessComputerSuite tests in test/.
6. All the XXXVariantProcessXXXSuites will have a static VariantConverter field. (see its use above)
The sucky thing is that we have to copy all the tests over and keep them in sync like we do with Gremlin-Groovy. The benefit is that things are "cleaner" and all this meta-program insanity can chill out a bit.
I think that sounds good. I was wondering if you were going to be able to implement it the way you initially proposed. Maintaining one more set of tests to cover many variants is an acceptable cost in my book.
So in branch TINKERPOP-1278 we have 2 different ways of doing this implemented and I think a third (yet unimplemented) way may be the best. I will describe all three models with their +/- so people can help decide which model we should go with. First, note that all three models make use of a new construct called a Translator. In short, a Translator wraps a StringBuilder and for each traversal.x(y,z) call, Translator.addStep(traversal, "x", y, z) is called. In essence the translator is able to take calls in one Gremlin variant and convert them to calls in another Gremlin variant.
1. ScriptXXXTraversal and ScriptXXXTraversalSource
ScriptGraphTraversal implements all the methods of GraphTraversal, but instead of addStep(new BlahStep()), it calls Translator.addStep(stepName, Object... stepArguments). When applyStrategies() is called (i.e. the traversal is compiled), the Translator.getScriptEngine() is used to compile the internal Translator.getTraversalScript() and the result of the compilation is added to the ScriptGraphTraversal and thus, the ScriptGraphTraversal has implemented steps and is ready to execute.
plus Translation is decoupled from DefaultGraphTraversal and thus, an entirely separate class space.
minus Every XXXTraversal will need to have a ScriptXXXTraversal implementation. For GraphTraversal, this was a copy/paste nightmare.
minus ScriptGraphTraversalSource is required and thus, this is NOT a TraversalStrategy but a new TraversalSource – graph.traversal(PythonTranslator.of("g")).
2. TranslationStrategy implements CreationStrategy
In this model, there is a new type of TraversalStrategy called a CreationStrategy whose sort order comes before DecorationStrategy. Along with apply(traversal) method, creation strategies also have a addStep(stepName,stepArguments...) method as well as a getTranslator() method. This means that every GraphTraversal method has the following form:
public default GraphTraversal<S, Vertex> out(final String... edgeLabels) { TraversalHelper.addStepToCreationStrategies(this.asAdmin(), "out", edgeLabels); return this.asAdmin().addStep(new VertexStep<>(this.asAdmin(), Vertex.class, Direction.OUT, edgeLabels)); }
Finally, TranslationStrategy.apply() does like RemoteStrategy does and simple compiles the script and inserts the steps into the traversal.
plus CreationStrategy is like an interceptor and we could reuse this construct in other areas.
minus If you are doing translation, you are also constructing the traversals in the host language (wasted memory/clock cycles).
minus CreationStrategy is sorta like a TraversalStrategy but has more methods.
plus Everyone's implementation of XXXTraversal has all the machinery for translation. No need for a parallel ScriptXXXTraversal implementation to maintain.
3. Traversal.setTranslator(translator)
If we make Translator core to Traversal then we would have the benefits of the two above without the drawbacks.
public default GraphTraversal<S, Vertex> out(final String... edgeLabels) { if(null != this.translator) { this.translator.addStep(this.asAdmin(), "out", edgeLabels); return this; } else return this.asAdmin().addStep(new VertexStep<>(this.asAdmin(), Vertex.class, Direction.OUT, edgeLabels)); }
plus You either translate or your construct – no wasted memory/clock cycles.
minus Translator is now a "new concept" parallel with TraversalStrategies and TraversalSideEffects.
Finally, I really do like CreationStrategy, but if we do it, then I think that instantiating steps and adding them to the traversal would actually be a CreationStrategy. For instance:
public default GraphTraversal<S, Vertex> out(final String... edgeLabels) { TraversalHelper.addStepToCreationStrategies(this.asAdmin(), "out", edgeLabels); return this; }
That is, there would be a AddStepStrategy which would traversal.addStep(new VertexStep<>(this.asAdmin(), Vertex.class, Direction.OUT, edgeLabels))! If you wanted to translate only, then you would remote AddStepStrategy and add TranslationStrategy.... CraZy. The drawback of this is that AddStepStrategy would look something like this:
switch stepName case "out": traversal.addStep(new VertexStep<>(this.asAdmin(), Vertex.class, Direction.OUT, (String[]) args[0])); break; case "map": ... case "fold": ... case "groupCount": ...
I think that we should basically rule option 1 because this minus:
> minus Every XXXTraversal will need to have a ScriptXXXTraversal implementation. For GraphTraversal, this was a copy/paste nightmare.
sound like a bad way to go. As I read the end of option 3 i was already thinking about your last bit where you would delegate traversal step adding to the strategy. We could probably do better than a giant switch embedded in the strategy to implement that. Maybe it's some form of Command Pattern where you would have this base class that lets you register a AddStepCommand for each available step name. Then you just lookup the AddStepCommand from a Map of commands? It leads to lots of small classes (which could probably just be inner classes of AddStepCommand interface) but at least then it's more modular and each individual command would be easily testable. Maybe we even get lucky and the same command might be useful across multiple step additions....
Github user okram commented on the issue:
https://github.com/apache/tinkerpop/pull/340
How do we use this again? This is just for the `GeoPoint`-style adding classes? Can you provide an example so we can add it to the docs. See:
https://github.com/apache/tinkerpop/blob/TINKERPOP-1278/docs/src/reference/gremlin-variants.asciidoc
Github user spmallette commented on the issue:
https://github.com/apache/tinkerpop/pull/340
ah - didn't realize that build was hosed in the TINKERPOP-1278 branch itself.
Github user okram commented on the issue:
https://github.com/apache/tinkerpop/pull/343
This is how the Python modules are loaded into Jython.
Recommendations to make it more "package like" ?
Github user okram commented on the issue:
https://github.com/apache/tinkerpop/pull/343
This just changed to here:
Long literals with a trailing 'L', as in the JythonTranslator (see github link below) are a syntax error in python3. All integers in python3 are of type 'int', which is equivalent to the 'long' type in python2. I believe that, at least in cpython, the L is not necessary in python2 because literals that are too long to be regular ints are automatically longs. But I don't know whether the situation is different for jython.
GitHub user leifurhauks opened a pull request:
https://github.com/apache/tinkerpop/pull/359
TINKERPOP-1278 : Removed long literal 'L' suffix
The 'L' suffix on numeric literals is a syntax error in python3. As far
as I can tell, the suffix on the literal in this expression in
JythonTranslator isn't necessary on python2, so removing it restores 2/3
compatibility.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/leifurhauks/incubator-tinkerpop fix_long_literal
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/tinkerpop/pull/359.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #359
commit 62af276e77e95ca21ac572a2f8a6fa8804489773
Author: Leifur Halldor Asgeirsson <lasgeirsson@zerofail.com>
Date: 2016-07-08T14:03:54Z
TINKERPOP-1278 : Removed long literal 'L' suffix
The 'L' suffix on numeric literals is a syntax error in python3. As far
as I can tell, the suffix on the literal in this expression in
JythonTranslator isn't necessary on python2, so removing it restores 2/3
compatibility.
I've been getting back up to speed on recent developments in gremlin-python. I was wondering what the purpose is of the TraversalStrategies.global_cache mapping.
Seems that in order to instantiate a GraphTraversalSource, the name of the graph class used must be a key in TraversalStrategies.global_cache. Is there a reason why the traversal strategies for a given graph class can't simply be defined directly on that graph class?
Github user okram commented on the issue:
https://github.com/apache/tinkerpop/pull/357
I tried to merge this but there were two problem:
1.) the PR references old code and thus, lots of conflicts.
2.) graph_traversal and traversal are auto-generated and thus editing the .py does nothing.
3.) Jython complains in JythonScriptEngineSetup (tests don't pass)
If you can fix this PR accordingly, I'd be happy to merge.
Github user leifurhauks commented on the issue:
https://github.com/apache/tinkerpop/pull/357
Brought this branch up to date and fixed the code generators.
Github user leifurhauks commented on the issue:
https://github.com/apache/tinkerpop/pull/357
Added WIP to the title of the PR while I sort out some silly mistakes I made with the conflict resolution...
Github user leifurhauks commented on the issue:
https://github.com/apache/tinkerpop/pull/357
Okay, should be good now
This branch is too big for normal review via PR. I've started a thread on dev to handle any discussion related to getting this branch merged back to master.
Github user newkek commented on the issue:
https://github.com/apache/tinkerpop/pull/390
Seems like it is only on my machine... I think Stephen tried on the docker image ?
Yes what I have seen so far is it happens at the moment we call `deserializationContext.findContextualValueDeserializer(typeFromId)` in `GraphSONTypeDeserializer` where the `typeFromId` is the type that has been constructed at init time with `TypeFactory.defaultInstance().constructType(Tree.class)`. I also see that for some reason I don't get, it happens to me only when the [Enums types](https://github.com/apache/tinkerpop/blob/TINKERPOP-1278/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphson/GraphSONModule.java#L131-L140) from the Traversal serialzers have been constructed as well. If I comment those lines, I've got no problem when doing constructType(Tree.class) like all the rest..
I understand the desire to leverage existing testing infrastructure to exercise various language variants, but my impression from this thread is that the tests are actually testing a mix of variant and driver, since you're testing the results that come back from the Gremlin server.
If you want to focus on testing the variant itself, I’d think you’d want a bunch of queries written in the Ruby variant along with expected “byte code” (which may be a string) that will be sent. In this simple form, you’d have to duplicate each test for every variant. To avoid that, you could write the queries in a DSL and have a test tool convert it to the right query in the desired language, run it, and then compare result with the expected “byte code”. But that reasoning can be recursive — how do you know the tool generating the queries in Ruby (for example) is doing it right?
That skirts the issue presented in this Jira: you need some tests that make sure that results are coerced into native types properly (e.g. driver tests). I haven’t done a lot with JRuby myself — wrote one native module for Kerberos support in the DataStax Enterprise Ruby driver, but that’s about it. In my context, the logic flow was really controlled by Ruby code, and there were callouts to JRuby for specific things. The way this ticket talks about ProcessXXXSuite, it sounds like ProcessXXXSuite is the master…like running a Java program and embedding Ruby in it, rather than the other way around. Sounds reasonable.
I'm not saying end-to-end tests aren't useful, but I feel like this ticket is focused on variant testing...so it should purely test variants.
I see this is merged now. I wanted to follow up on some comments I had on one commit pre-merge:
https://github.com/apache/tinkerpop/commit/6ed7edc0b4ad2abf933e917812d49ad92230c8d1#commitcomment-18473940
Is there still a chance to make this generate more idiomatic Python as suggested in the GLV docs? Was this discussed anywhere else?
stamhankar999 thanks for your thoughts on the testing aspects of things. i think we have some work to do before we can call the testing model completely settled for all GLVs. I suspect some changes will occur. I think we need a good mix of native tests as well as those that can be executed via the process suites. We will try to come to a full understanding of that model for 3.3.x.
aholmber i remember your commit comment. i think that happened when okram was away on vacation. since he wrote the code generation stuff for that, perhaps he'd be best to have a look and address your concerns.
Adam Holmberg - Regarding naming, while methods typically have lowercase names separated with underscores, PEP 8 does suggest that "mixedCase is allowed only in contexts where that's already the prevailing style". While this typically refers to backwards compatibility, you see mixed case in libraries that provide bridges to other languages...the example that comes to mind is PyObjC. I wonder if this is an example where using mixed case is acceptable in order to give gremlin-python the same "feel" as gremlin...thoughts?
Whether gremlin-python ultimately uses mixedCase or snake_case methods, I would like to echo aholmber's suggestion of using trailing, rather than leading underscores. It's a pretty strong python convention that objects with a leading underscore are considered internal and subject to change, so python library users will tend to avoid them. Also, python tools such as IDEs will often ignore them for autocompletion, or suggest them last.
I think that if the leading underscore thing is so anti-python that IDEs may ignore them then it makes sense to make this change. Created this TINKERPOP-1425 and assigned it to davebshow
I see it's already revised, but just to acknowledge the response: I'm +1 on this direction. Thanks for the discussion.
Regarding the testing framework....
I assume that we are generating both types of GLVs described in the tutorial? so a Jython version and a version that generates gremlin-groovy strings to be submit with a driver? I guess your approach for the languages that have a "J" version (Jython, JRuby, JPHP (which does seem to exist - https://github.com/jphp-compiler/jphp) should work because we're just operating on the JVM and the approach will look similar to how Gremlin Groovy test things. That pattern seems pretty well established, though I acknowledge the headache of trying to maintain that manually..
For the testing the groovy strings I don't think we want to include REST or Gremlin Server. Since the language bindings generate gremlin-groovy strings, i think we can just build some sort of test harness that lets us pass that string through the GremlinGroovyScriptEngine to get the JVM Traversal instance that all the process tests want to have, then the standard asserts of the test suite just work (as above). I think that would reduce some complexity.