Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5949

Tasktracker's java threads hunging

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 1.2.1
    • None
    • None
    • None
    • FreeBSD-10/stable
      openjdk version "1.7.0_60"
      OpenJDK Runtime Environment (build 1.7.0_60-b19)
      OpenJDK 64-Bit Server VM (build 24.60-b09, mixed mode)

    Description

      I set up hadoop-1.2.1 (from ports) on FreeBSD-10/stable with openjdk version 1.7.0_60.

      On the first glance it is doing well except one annoying thing: after executing some tasks, tasktracker process starts to eat CPU when idle.
      Sometimes it is 10-20% (numbers from top(1) output), sometimes it is 100-150%.

      In tasktrackers's log I see numerious records like this:

      2014-06-09 13:08:29,858 INFO org.mortbay.log: org.mortbay.io.nio.SelectorManager$SelectSet@abdcc1c JVM BUG(s) - injecting delay59 times
      2014-06-09 13:08:29,859 INFO org.mortbay.log: org.mortbay.io.nio.SelectorManager$SelectSet@abdcc1c JVM BUG(s) - recreating selector 59 times, canceled keys 944 times
      2014-06-09 13:09:29,862 INFO org.mortbay.log: org.mortbay.io.nio.SelectorManager$SelectSet@abdcc1c JVM BUG(s) - injecting delay58 times
      2014-06-09 13:09:29,862 INFO org.mortbay.log: org.mortbay.io.nio.SelectorManager$SelectSet@abdcc1c JVM BUG(s) - recreating selector 58 times, canceled keys 928 times
      2014-06-09 13:10:29,901 INFO org.mortbay.log: org.mortbay.io.nio.SelectorManager$SelectSet@abdcc1c JVM BUG(s) - injecting delay58 times
      2014-06-09 13:10:29,901 INFO org.mortbay.log: org.mortbay.io.nio.SelectorManager$SelectSet@abdcc1c JVM BUG(s) - recreating selector 58 times, canceled keys 928 times
      <...>

      The more jobs I run, more java threads start to consume CPU after all tasks finished. After several job execution, top(1) output looks like this (splitted by thread, the same PID):

      PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
      79045 hadoop 47 0 1948M 867M uwait 2 20:49 37.50% java

      {java}
      79045 hadoop 31 0 1948M 867M uwait 31 1:45 19.29% java{java}

      79045 hadoop 33 0 1948M 867M uwait 21 2:51 19.19% java

      {java}
      79045 hadoop 30 0 1948M 867M uwait 17 2:51 18.65% java{java}

      79045 hadoop 30 0 1948M 867M uwait 11 1:52 18.36% java

      {java}
      79045 hadoop 30 0 1948M 867M uwait 22 1:45 18.36% java{java}

      79045 hadoop 31 0 1948M 867M uwait 29 2:50 18.26% java

      {java}
      79045 hadoop 31 0 1948M 867M uwait 6 1:57 18.16% java{java}

      79045 hadoop 31 0 1948M 867M uwait 13 4:55 17.97% java

      {java}
      79045 hadoop 31 0 1948M 867M uwait 26 3:39 17.77% java{java}

      79045 hadoop 33 0 1948M 867M uwait 8 1:21 17.48% java

      {java}
      79045 hadoop 30 0 1948M 867M uwait 1 3:32 16.70% java{java}

      79045 hadoop 32 0 1948M 867M uwait 24 3:12 16.70% java

      {java}
      79045 hadoop 26 0 1948M 867M uwait 4 1:27 10.35% java{java}

      72417 root 20 0 19828K 3252K CPU21 21 0:00 0.29% top
      836 root 20 0 36104K 1952K select 14 6:51 0.00% snmpd
      79045 hadoop 20 0 1948M 867M uwait 20 6:51 0.00% java

      {java}
      79045 hadoop 20 0 1948M 867M uwait 27 3:45 0.00% java{java}

      79045 hadoop 20 0 1948M 867M uwait 30 2:37 0.00% java

      {java}
      79045 hadoop 20 0 1948M 867M uwait 15 0:54 0.00% java{java}

      79045 hadoop 20 0 1948M 867M uwait 2 0:48 0.00% java

      {java}
      79045 hadoop 20 0 1948M 867M uwait 14 0:48 0.00% java{java}

      79045 hadoop 20 0 1948M 867M uwait 2 0:48 0.00% java

      {java}

      <....>

      This is on absolutely idle cluster, no single task is running.
      I am attaching truss(1) output for that java process:

      Attachments

        1. task1.txt
          110 kB
          Dmitry Sivachenko

        Issue Links

          Activity

            People

              Unassigned Unassigned
              trtrmitya Dmitry Sivachenko
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: