Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5512

TaskTracker hung after failed reconnect to the JobTracker

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.3.0
    • 1-win, 1.3.0
    • tasktracker
    • None
    • Reviewed

    Description

      TaskTracker hung after failed reconnect to the JobTracker.

      This is the problematic piece of code:

          this.distributedCacheManager = new TrackerDistributedCacheManager(
              this.fConf, taskController);
          this.distributedCacheManager.startCleanupThread();
          
          this.jobClient = (InterTrackerProtocol) 
          UserGroupInformation.getLoginUser().doAs(
              new PrivilegedExceptionAction<Object>() {
            public Object run() throws IOException {
              return RPC.waitForProxy(InterTrackerProtocol.class,
                  InterTrackerProtocol.versionID,
                  jobTrackAddr, fConf);
            }
          });
      

      In case RPC.waitForProxy() throws, TrackerDistributedCacheManager cleanup thread will never be stopped, and given that it is a non daemon thread it will keep TT up forever.

      Attachments

        1. MAPREDUCE-5512.branch-1.patch
          6 kB
          Ivan Mitic
        2. tt_Hung.txt
          17 kB
          Ivan Mitic
        3. hadoop-tasktracker-RD00155DD09100.log
          10 kB
          Ivan Mitic

        Activity

          People

            ivanmi Ivan Mitic
            ivanmi Ivan Mitic
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: