Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-628

GridMix job failed with finalStatus='Killed' due to NullPointerException when one of the NMs went bad

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.2.0
    • None
    • None

    Description

      GRIDMIX000162 failed with final status = "Killed".

      Note: RM reuse feature has been disabled.While job was running one of the NMs went bad.

      AM log shows Null pointer Exception at org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread.
      -------------------------
      724 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.rm.container.AMContainerImpl: AMContainer container_1384820885084_0165_01_000004 transitioned from STOPPING to COMPLETED via event C_COMPLETED
      2013-11-20 00:10:59,724 INFO [TaskSchedulerEventHandlerThread] org.apache.tez.dag.app.rm.TaskSchedulerEventHandler: Processing the event EventType: S_CONTAINER_COMPLETED
      2013-11-20 00:11:26,976 FATAL [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
      java.lang.NullPointerException
      at org.apache.tez.dag.app.rm.node.AMNodeMap.handle(AMNodeMap.java:145)
      at org.apache.tez.dag.app.rm.node.AMNodeMap.handle(AMNodeMap.java:39)
      at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
      at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)
      at java.lang.Thread.run(Thread.java:662)
      2013-11-20 00:11:26,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
      2013-11-20 00:11:26,978 INFO [Thread-2] org.apache.tez.dag.app.DAGAppMaster: DAGAppMaster received a signal. Signaling TaskScheduler
      2013-11-20 00:11:26,978 INFO [Thread-2] org.apache.tez.dag.app.rm.TaskSchedulerEventHandler: TaskScheduler notified that iSignalled was : true
      2013-11-20 00:11:26,979 INFO [Thread-2] org.apache.tez.dag.history.HistoryEventHandler: Stopping HistoryEventHandler
      ------------------------------
      At the same time , one of the NMs died due to "java.io.IOException: No space left on device"
      ------------------------------
      2013-11-20 00:00:54,353 WARN nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:createAppLogDirs(548)) - Unable to create the app-log directory : /tmp/yarn/log/application_1384820885084_0097
      java.io.IOException: mkdir of /tmp/yarn/log/application_1384820885084_0097 failed
      at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1061)
      at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:150)
      at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:187)
      at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:720)
      at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:716)
      at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
      at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:716)
      at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:425)
      at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createAppLogDirs(DefaultContainerExecutor.java:546)
      at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:95)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:977)
      2013-11-20 00:00:54,522 FATAL yarn.YarnUncaughtExceptionHandler (YarnUncaughtExceptionHandler.java:uncaughtException(51)) - Thread Thread[LocalizerRunner for container_1384820885084_0097_01_000003,5,main] threw an Error. Shutting down now...
      org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
      at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:238)
      at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
      at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
      at java.io.FilterOutputStream.close(FilterOutputStream.java:140)
      at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:71)
      at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:104)
      at org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.close(ChecksumFs.java:364)
      at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:71)
      at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:104)
      at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)
      at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:254)
      at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:61)
      at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112)
      at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2168)
      at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2109)
      at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:102)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:977)
      Caused by: java.io.IOException: No space left on device
      at java.io.FileOutputStream.writeBytes(Native Method)
      at java.io.FileOutputStream.write(FileOutputStream.java:282)
      at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:236)
      ... 16 more
      2013-11-20 00:00:54,524 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status -1
      --------------------------

      Attachments

        1. TEZ-628.txt
          4 kB
          Siddharth Seth

        Activity

          People

            sseth Siddharth Seth
            yeshavora Yesha Vora
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: