Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
3.0.1-incubating
-
None
Description
After working on BulkLoaderVertexProgram for Titan, it is trivial to add this generally to TinkerPop – equivalent to BlueprintsOutputFormat (or whatever the bulk loader was known that was blueprints specific). However, given that Titan and TinkerPop have the same data model, Titan having its own BulkLoaderVertexProgram isn't necessary as there is no longer a data model alignment issue. The difference would be that instead of:
g.V.compute().program(BulkLoaderVertexProgram.build().titan(propertiesFile).create()).submit()
It would simply be:
g.V.compute().program(BulkLoaderVertexProgram.build().factory(propertiesFile).create()).submit()
...and BulkLoaderVertexProgram would use GraphFactory.open() to instantiate the connection to the graph. Moreover, (and spmallette will need to clear my head here), if the factory opened up a Gremlin Server connection, then we get parallel writing to embedded graph databases like Neo4j.
BulkLoaderVertexProgram is simply a vertex program that parallel loads a graph (with a graph computer) to any other graph that can be accessed via GraphFactory (which is every TP3 graph).
EXTENDED NOTES:
- SchemaInference would be a MapReduce job executed prior to BulkLoaderVertexProgram
- Titan and Neo4j can each have their own SchemaInference implementations.
- Incremental loading .... I forget how this worked.
- Bulk mutations ... this is possible at the TP3 level with hidden properties and smart add/remove/etc.
Note that completion of this issue will essentially factor out the BatchGraph implementation.