[HADOOP-200] The map task names are sent to the reduces - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.2.0
Fix Version/s: 0.3.0
Component/s: None
Labels:
None

Description

As each reduce is created, it is given the entire set of potential map names. For my large sort jobs with 64k maps, this means that each reduce task is given a two dimensional array that is 5 tasks/map * 64k maps = 320k strings. Since the reduce task is passed from the job tracker to the task tracker and down to the task runner, passing the entire list is very expensive. I suspect that this is the cause of the slow downs that I see in the task trackers heart beats when the reduce tasks are being launched.

I propose that the ReduceTask be changed to just get the count of maps, with ids from 0 .. maps -1.
public ReduceTask(String jobFile, String taskId, int maps, int partition);
Then we need to change the protocol for finding map outputs:
MapOutputLocation[] locateMapOutputs(String jobId, int[] mapIds, int partition);

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

map-id.patch
09/May/06 05:20
21 kB
Owen O'Malley

Activity

People

Assignee:: Owen O'Malley

Reporter:: Owen O'Malley

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 07/May/06 10:54

Updated:: 08/Jul/09 16:51

Resolved:: 16/May/06 02:18