[MAPREDUCE-7074] Shuffle get stuck in fetch failures loop, when a few mapoutput were lost or corrupted and task timeout was set to 0 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.8.0, 3.0.0
Fix Version/s: 2.8.0
Component/s: mrv2, task
Labels:
None
Environment:

cdh 5.10.0 , apache hadoop 2.8.0

Target Version/s:

2.8.0

Description

When a MR job like this:

MR job with many map tasks, such as 10000 or more
a few map output were lost or corrupted after map task complete successfully and before shuffle start
mapreduce.task.timeout was set to 0 and mapreduce.task.progress-report.interval was not set

the shuffle of reduce task will get stuck in fetch failures loop for a long time, several or even dozens of hours.

It was caused by ~~MAPREDUCE-6740~~, it releate mapreduce.task.timeout with mapreduce.task.progress-report.interval by MRJobConfUtil.getTaskProgressReportInterval()

  public static long getTaskProgressReportInterval(final Configuration conf) {
    long taskHeartbeatTimeOut = conf.getLong(
        MRJobConfig.TASK_TIMEOUT, MRJobConfig.DEFAULT_TASK_TIMEOUT_MILLIS);
    return conf.getLong(MRJobConfig.TASK_PROGRESS_REPORT_INTERVAL,
        (long) (TASK_REPORT_INTERVAL_TO_TIMEOUT_RATIO * taskHeartbeatTimeOut));
  }

When mapreduce.task.timeout was set to 0 and mapreduce.task.progress-report.interval was not set, getTaskProgressReportInterval will retrun 0L.
In the class TaskReporter which is used to report task progress and status to AM, it set taskProgressInterval= MRJobConfUtil.getTaskProgressReportInterval(), and lock.wait(taskProgressInterval) before every progress report.

 public void run() {
      ...skip...
      long taskProgressInterval = MRJobConfUtil.
          getTaskProgressReportInterval(conf);
      while (!taskDone.get()) {
        ...skip...
        try {
          // sleep for a bit
          synchronized(lock) {
            if (taskDone.get()) {
              break;
            }
            lock.wait(taskProgressInterval);
          }
          if (taskDone.get()) {
            break;
          }
          if (sendProgress) {
            // we need to send progress update
            updateCounters();
            taskStatus.statusUpdate(taskProgress.get(),
                                    taskProgress.toString(), 
                                    counters);
            taskFound = umbilical.statusUpdate(taskId, taskStatus);
            taskStatus.clearStatus();
          }
          ...skip...
        } 
        ...skip...
      }
   }

When mapreduce.task.timeout was set to 0, lock.wait(taskProgressInterval) will be lock.wait(0), and because there is no operation to notify it ,the reporter will wait all the time and don't report anything to AM.
So, when fetch failures happend in shuffle, TaskReporter will not report fetch failures to AM , although the log of reducer show message"Reporting fetch failure...", and the fetch failures loop will not stop util reduce task failed for exceeded MAX_FAILED_UNIQUE_FETCHES.

So, it's necessary to set a TASK_PROGRESS_REPORT_INTERVAL_MAX value (such as 30s) when the taskProgressInterval return by MRJobConfUtil.getTaskProgressReportInterval() equals 0 or beyond the max value, set the taskProgressInterval = TASK_PROGRESS_REPORT_INTERVAL_MAX.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

MAPREDUCE-7074.patch
09/Apr/18 15:45
2 kB
Chengwei Wang
ExceptionMsg.txt
09/Apr/18 15:45
37 kB
Chengwei Wang

Activity

People

Assignee:: Unassigned

Reporter:: Chengwei Wang

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 09/Apr/18 15:39

Updated:: 09/Apr/18 15:45