Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
3.2.0-incubating
-
None
Description
On line 43, the call to Constants.getSearchGraphLocation returns Optional.empty() if gremlin.hadoop.inputLocation=none as advised in Titan's CassandraInputFormat and HBaseInputFormat. Changing the readGraphRDD method to call .isPresent() and only set the storage location in the config if so allows SparkGraphComputer from the 3.2.0-SNAPSHOT branch to work with Titan via CassandraInputFormat in a traversal source:
// Imports import java.util.Optional; @Override public JavaPairRDD<Object, VertexWritable> readGraphRDD(final Configuration configuration, final JavaSparkContext sparkContext) { final org.apache.hadoop.conf.Configuration hadoopConfiguration = ConfUtil.makeHadoopConfiguration(configuration); // This part was used directly in hadoopConfiguration.set(...) final Optional<String> searchGraph = Constants.getSearchGraphLocation(configuration.getString(Constants.GREMLIN_HADOOP_INPUT_LOCATION), FileSystemStorage.open(hadoopConfiguration)); if (searchGraph.isPresent()) { hadoopConfiguration.set(configuration.getString(Constants.GREMLIN_HADOOP_INPUT_LOCATION), searchGraph.get()); } return sparkContext.newAPIHadoopRDD(hadoopConfiguration, (Class<InputFormat<NullWritable, VertexWritable>>) hadoopConfiguration.getClass(Constants.GREMLIN_HADOOP_GRAPH_INPUT_FORMAT, InputFormat.class), NullWritable.class, VertexWritable.class) .mapToPair(tuple -> new Tuple2<>(tuple._2().get().id(), new VertexWritable(tuple._2().get())));
I don't really understand the intended behaviour, so this is probably not the right thing to do. Would the addition of a configuration variable such as "gremlin.hadoop.inputLocationRequired" that defaults to true, and can be set to false for these other input formats work?