[HADOOP-2560] Processing multiple input splits per mapper task - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

Currently, an input split contains a consecutive chunk of input file, which by default, corresponding to a DFS block.
This may lead to a large number of mapper tasks if the input data is large. This leads to the following problems:

1. Shuffling cost: since the framework has to move M * R map output segments to the nodes running reducers,
larger M means larger shuffling cost.

2. High JVM initialization overhead

3. Disk fragmentation: larger number of map output files means lower read throughput for accessing them.

Ideally, you want to keep the number of mappers to no more than 16 times the number of nodes in the cluster.
To achive that, we can increase the input split size. However, if a split span over more than one dfs block,
you lose the data locality scheduling benefits.

One way to address this problem is to combine multiple input blocks with the same rack into one split.
If in average we combine B blocks into one split, then we will reduce the number of mappers by a factor of B.
Since all the blocks for one mapper share a rack, thus we can benefit from rack-aware scheduling.

Thoughts?

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

multipleSplitsPerMapper.patch
30/Oct/08 17:32
12 kB
Dhruba Borthakur

Issue Links

is blocked by

HADOOP-249 Improving Map -> Reduce performance and Task JVM reuse

Closed

is duplicated by

HADOOP-4565 MultiFileInputSplit can use data locality information to create splits

Closed

is related to

MAPREDUCE-93 Job Tracker should prefer input-splits from overloaded racks

Open

HADOOP-3293 When an input split spans cross block boundary, the split location should be the host having most of bytes on it.

Closed

HADOOP-249 Improving Map -> Reduce performance and Task JVM reuse

Closed

Activity

People

Assignee:: Dhruba Borthakur

Reporter:: Runping Qi

Votes:: 0 Vote for this issue

Watchers:: 25 Start watching this issue

Dates

Created:: 09/Jan/08 16:34

Updated:: 17/Jul/14 19:31

Resolved:: 17/Jul/14 19:31