[HADOOP-924] Map task is not getting rescheduled although the corresponding TT got lost - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: 0.12.1
Component/s: None
Labels:
None

Description

I encountered this "job hung" situation during one of the sort runs. Two tasks assigned to a TT were never rescheduled although the TT was lost and this led to the job getting stuck forever. This TT was assigned lots of tasks and everyone got rescheduled except these two. Here are the relevant log messages (below the JT logs has been split into two parts to bring out the sequence of events) for one of the tasks.

JT log:
---------
2007-01-24 10:53:09,564 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_0001_m_020699
2007-01-24 10:53:09,564 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_0001_m_020699_0' to tip tip_0001_m_020699, for tracker 'foo.com:7020'

TT log:
---------
2007-01-24 10:53:09,564 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction: task_0001_m_020699_0
2007-01-24 10:53:12,180 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_020699_0 0.0% hdfs://foo:50000/user/ddas/somedir/part002444:134217728+134217728

JT log:
---------
2007-01-24 11:05:32,409 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker 'foo.com:7020'

Looks like there is some race condition. Since only two out of the many tasks never got rescheduled, could mean that the JT was somehow unaware of the state of this two tasks after it assigned them to the (soon-to-be-lost) TT (did they get added to the relevant tables properly?).

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Devaraj Das

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 24/Jan/07 12:44

Updated:: 08/Jul/09 16:52

Resolved:: 13/Mar/07 03:39