Details
-
Improvement
-
Status: In Progress
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
Description
I wrote a generic InputFormat that wraps any other InputFormat, and creates CompositeInputSplits to reduce the number of map tasks in a controllable manner while preserving data locality. A correspondent CompositeRecordReader is written to iterate through underlying RecordReaders as created by the underlying InputFormat for each underlying raw split.
An application to this is to group TableSplits when the raw splits are coming from multiple regions and are filtered with key ranges. We use this to shard/distribute a time based incremental access to an hbase table.