[TINKERPOP-1117] InputFormatRDD.readGraphRDD requires a valid gremlin.hadoop.inputLocation, breaking InputFormats (Cassandra, HBase) that don't need one - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: 3.2.0-incubating
Fix Version/s: 3.1.1-incubating
Component/s: hadoop
Labels:
None

Description

On line 43, the call to Constants.getSearchGraphLocation returns Optional.empty() if gremlin.hadoop.inputLocation=none as advised in Titan's CassandraInputFormat and HBaseInputFormat. Changing the readGraphRDD method to call .isPresent() and only set the storage location in the config if so allows SparkGraphComputer from the 3.2.0-SNAPSHOT branch to work with Titan via CassandraInputFormat in a traversal source:

// Imports
import java.util.Optional;

@Override
public JavaPairRDD<Object, VertexWritable> readGraphRDD(final Configuration configuration, final JavaSparkContext sparkContext) {
    final org.apache.hadoop.conf.Configuration hadoopConfiguration = ConfUtil.makeHadoopConfiguration(configuration);
    // This part was used directly in hadoopConfiguration.set(...)
    final Optional<String> searchGraph = Constants.getSearchGraphLocation(configuration.getString(Constants.GREMLIN_HADOOP_INPUT_LOCATION), FileSystemStorage.open(hadoopConfiguration));
    if (searchGraph.isPresent()) {
        hadoopConfiguration.set(configuration.getString(Constants.GREMLIN_HADOOP_INPUT_LOCATION), searchGraph.get());
    }
    return sparkContext.newAPIHadoopRDD(hadoopConfiguration, (Class<InputFormat<NullWritable, VertexWritable>>) hadoopConfiguration.getClass(Constants.GREMLIN_HADOOP_GRAPH_INPUT_FORMAT, InputFormat.class),
        NullWritable.class,
        VertexWritable.class)
        .mapToPair(tuple -> new Tuple2<>(tuple._2().get().id(), new VertexWritable(tuple._2().get())));

I don't really understand the intended behaviour, so this is probably not the right thing to do. Would the addition of a configuration variable such as "gremlin.hadoop.inputLocationRequired" that defaults to true, and can be set to false for these other input formats work?

Attachments

Activity

People

Assignee:: Marko A. Rodriguez

Reporter:: Dylan Bethune-Waddell

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 03/Feb/16 08:50

Updated:: 04/Feb/16 01:26

Resolved:: 03/Feb/16 21:02