Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.12.3
-
None
-
None
-
All
Description
Currently, the JobClient sorts the InputSplits returned by InputFormat in descending order, so that the map tasks corresponding to larger input-splits are scheduled first for execution than smaller ones. However, this causes problems in applications that produce data-sets partitioned similarly to the input partition with -reducer NONE.
With -reducer NONE, map task i produces part-i. Howver, in the typical applications that use -reducer NONE it should produce a partition that has the same index as the input parrtition.
(Of course, this requires that each partition should be fed in its entirety to a map, rather than splitting it into blocks, but that is a separate issue.)
Thus, sorting input splits should be either controllable via a configuration variable, or the FileInputFormat should sort the splits and JobClient should honor the order of splits.
Attachments
Attachments
Issue Links
- relates to
-
HADOOP-1320 Rewrite 'random-writer' to use '-reducer NONE'
- Closed