Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-924

Map task is not getting rescheduled although the corresponding TT got lost

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • None
    • 0.12.1
    • None
    • None

    Description

      I encountered this "job hung" situation during one of the sort runs. Two tasks assigned to a TT were never rescheduled although the TT was lost and this led to the job getting stuck forever. This TT was assigned lots of tasks and everyone got rescheduled except these two. Here are the relevant log messages (below the JT logs has been split into two parts to bring out the sequence of events) for one of the tasks.

      JT log:
      ---------
      2007-01-24 10:53:09,564 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_0001_m_020699
      2007-01-24 10:53:09,564 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_0001_m_020699_0' to tip tip_0001_m_020699, for tracker 'foo.com:7020'

      TT log:
      ---------
      2007-01-24 10:53:09,564 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction: task_0001_m_020699_0
      2007-01-24 10:53:12,180 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_020699_0 0.0% hdfs://foo:50000/user/ddas/somedir/part002444:134217728+134217728

      JT log:
      ---------
      2007-01-24 11:05:32,409 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker 'foo.com:7020'

      Looks like there is some race condition. Since only two out of the many tasks never got rescheduled, could mean that the JT was somehow unaware of the state of this two tasks after it assigned them to the (soon-to-be-lost) TT (did they get added to the relevant tables properly?).

      Attachments

        Activity

          People

            Unassigned Unassigned
            ddas Devaraj Das
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: