Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 2.6.0
    • None
    • mrv2
    • None

    Description

      Problem
      Reducer gets stuck in copy phase and doesn't make progress for very long time. After killing this task for couple of times manually, it gets completed.

      Observations

      • Verfied gc logs. Found no memory related issues. Attached the logs.
      • Verified thread dumps. Found no thread related problems.
      • On verification of logs, fetcher threads are not copying the map outputs and they are just waiting for merge to happen.
      • Merge thread is alive and in wait state.

      Analysis
      On careful observation of logs, thread dumps and code, this looks to me like a classic case of multi-threading issue. Thread goes to wait state after it has been notified.

      Here is the suspect code flow.
      Thread #1
      Fetcher thread - notification comes first
      org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(Set<T>)

            synchronized(pendingToBeMerged) {
              pendingToBeMerged.addLast(toMergeInputs);
              pendingToBeMerged.notifyAll();
            }
      

      Thread #2
      Merge Thread - goes to wait state (Notification goes unconsumed)
      org.apache.hadoop.mapreduce.task.reduce.MergeThread.run()

              synchronized (pendingToBeMerged) {
                while(pendingToBeMerged.size() <= 0) {
                  pendingToBeMerged.wait();
                }
                // Pickup the inputs to merge.
                inputs = pendingToBeMerged.removeFirst();
              }
      

      Attachments

        1. jstat-gc.log
          0.7 kB
          Laxman
        2. reducer-container-partial.log.zip
          1.15 MB
          Laxman
        3. thread-dumps.out
          246 kB
          Laxman

        Issue Links

          Activity

            lakshman Laxman added a comment -

            Thanks a lot Jason for details. We are hitting exactly same scenario (disk bad) as explained in MAPREDUCE-6334.
            We will try the patch and update the details in this jira.

            lakshman Laxman added a comment - Thanks a lot Jason for details. We are hitting exactly same scenario (disk bad) as explained in MAPREDUCE-6334 . We will try the patch and update the details in this jira.

            I suspect this is a duplicate of MAPREDUCE-6334. I see a lot of these types of messages in the reducer log:

            2015-05-01 19:59:37,632 WARN [fetcher#13] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Shuffle output from glgs1190.grid.uh1.inmobi.com:13562 failed, retry it.
            

            I think it is leaking memory allocations from the shuffle errors and the shuffle buffer runs out of available memory (hence fetchers told to WAIT) but there isn't enough data in the shuffle buffer to trigger a merge. All of the memory that was leaked will never complete to kick off the merge and unblock the other threads.

            jlowe Jason Darrell Lowe added a comment - I suspect this is a duplicate of MAPREDUCE-6334 . I see a lot of these types of messages in the reducer log: 2015-05-01 19:59:37,632 WARN [fetcher#13] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Shuffle output from glgs1190.grid.uh1.inmobi.com:13562 failed, retry it. I think it is leaking memory allocations from the shuffle errors and the shuffle buffer runs out of available memory (hence fetchers told to WAIT) but there isn't enough data in the shuffle buffer to trigger a merge. All of the memory that was leaked will never complete to kick off the merge and unblock the other threads.
            lakshman Laxman added a comment -

            "Threads analysis" mentioned in description above found to be incorrect when I retrace the code flow. Pre-notification is not a problem as merger wait is guarded by size check.

            However, problem exists, fetchers are not proceeding and waiting for merger to free some memory and merge doing nothing.

            lakshman Laxman added a comment - "Threads analysis" mentioned in description above found to be incorrect when I retrace the code flow. Pre-notification is not a problem as merger wait is guarded by size check. However, problem exists, fetchers are not proceeding and waiting for merger to free some memory and merge doing nothing.
            lakshman Laxman added a comment -

            Attached the logs (container log, thread dumps, jstat output) for reference.

            Please note that, my thoughts on threading issue may be premature and incorrect. Irrespective of this analysis problem exists.

            lakshman Laxman added a comment - Attached the logs (container log, thread dumps, jstat output) for reference. Please note that, my thoughts on threading issue may be premature and incorrect. Irrespective of this analysis problem exists.

            People

              lakshman Laxman
              lakshman Laxman
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: