Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
Description
It is highly desirable for the current map/reduce framework to be able to call functions in c++ (or other languages).
I am proposing a generic entension to the current framework to achieve the above goal.
The extension is an application level solution, similar to
HadoopStreaming in spirit, thus does not have impact on Hadoop core.
I will maintain the native map/reduce execution model.
The basic idea is to use socket/rpc to go through the language barrier.
In particular, we can implement a generic mapper/reducer class in Java as a proxy for calling functions in other language.
The configure function of the class will create a process that will open a user specified shared lirary act as an RPC server.
The map function of the class will just invoke an RPC call the key/value pair.
Such an RPC call is expected to return a list of key/value pairs. The map function then can emit the outputs.
The below is a sketch for the generic class:
public class MapRedCPPAdapter implements Mapper, Reducer {
String sharedLibraryName;
RPCProxy theServer;
...
public void configure(JobConf job)
{ sharedLibraryName = job.get("shared.lib.name"); theServer = createServer(sharedLibraryName ); }public void close()
{ theServer.stop(); }public void map(key, value, output, repoter)
{ ArrayList pairs = invokeRemoteMap(theServer, key, value); emit(pairs) }public void reduce (key, values, output, reporter)
{ ArrayList pairs = invokeRemoteReduce(theServer, key, value); emit(pairs) }}
The cons of this approach include are the overhead associated with
RPC calls and creating an additional process per mapper/reducer task.
The pros are thhat the extension is clean, generic, simple. It is applicable to other foreign languages too.