Description
There are two types of input vertex-centric and edge-centric. Some algorithms need vertex centric data, such as pairs of vertex ids and initial page ranks. Some algorithms only look at edges. For example, connected components can be run without any vertex values.
In some conditions , we have only edge-centric data , have not vertex data . Convert edge-centric data to vertex-centric data is not bad, but is not better.
In my job , I modify EdgeInputSplitsCallable.java to support load vertex auto when loading edge。
My mainly modify is:
graphState.getWorkerClientRequestProcessor().sendEdgeRequest( sourceId, readerEdge); context.progress(); // do this before potential data transfer ++inputSplitEdgesLoaded; //modify begin Vertex<I, V, E, M> vertex = this.configuration.createVertex(); if (vertex.getValue() == null) { vertex.setValue(configuration.createVertexValue()); } vertex.setConf(configuration); vertex.setGraphState(graphState); vertex.initialize(edgeReader.getCurrentEdge().getTargetVertexId(), vertex.getValue()); PartitionOwner partitionOwner = bspServiceWorker.getVertexPartitionOwner(vertex.getId()); graphState.getWorkerClientRequestProcessor().sendVertexRequest( partitionOwner, vertex); // modify end // Update status every VERTICES_UPDATE_PERIOD edges if (inputSplitEdgesLoaded % VERTICES_UPDATE_PERIOD == 0) { totalEdgesMeter.mark(VERTICES_UPDATE_PERIOD);
After do it , my giraph job can only load edge data in any examples。
I wish a similar features can be add into in future version。