Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
GRIDMIX000162 failed with final status = "Killed".
Note: RM reuse feature has been disabled.While job was running one of the NMs went bad.
AM log shows Null pointer Exception at org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread.
-------------------------
724 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.rm.container.AMContainerImpl: AMContainer container_1384820885084_0165_01_000004 transitioned from STOPPING to COMPLETED via event C_COMPLETED
2013-11-20 00:10:59,724 INFO [TaskSchedulerEventHandlerThread] org.apache.tez.dag.app.rm.TaskSchedulerEventHandler: Processing the event EventType: S_CONTAINER_COMPLETED
2013-11-20 00:11:26,976 FATAL [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
java.lang.NullPointerException
at org.apache.tez.dag.app.rm.node.AMNodeMap.handle(AMNodeMap.java:145)
at org.apache.tez.dag.app.rm.node.AMNodeMap.handle(AMNodeMap.java:39)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)
at java.lang.Thread.run(Thread.java:662)
2013-11-20 00:11:26,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
2013-11-20 00:11:26,978 INFO [Thread-2] org.apache.tez.dag.app.DAGAppMaster: DAGAppMaster received a signal. Signaling TaskScheduler
2013-11-20 00:11:26,978 INFO [Thread-2] org.apache.tez.dag.app.rm.TaskSchedulerEventHandler: TaskScheduler notified that iSignalled was : true
2013-11-20 00:11:26,979 INFO [Thread-2] org.apache.tez.dag.history.HistoryEventHandler: Stopping HistoryEventHandler
------------------------------
At the same time , one of the NMs died due to "java.io.IOException: No space left on device"
------------------------------
2013-11-20 00:00:54,353 WARN nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:createAppLogDirs(548)) - Unable to create the app-log directory : /tmp/yarn/log/application_1384820885084_0097
java.io.IOException: mkdir of /tmp/yarn/log/application_1384820885084_0097 failed
at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1061)
at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:150)
at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:187)
at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:720)
at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:716)
at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:716)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:425)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createAppLogDirs(DefaultContainerExecutor.java:546)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:95)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:977)
2013-11-20 00:00:54,522 FATAL yarn.YarnUncaughtExceptionHandler (YarnUncaughtExceptionHandler.java:uncaughtException(51)) - Thread Thread[LocalizerRunner for container_1384820885084_0097_01_000003,5,main] threw an Error. Shutting down now...
org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:238)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
at java.io.FilterOutputStream.close(FilterOutputStream.java:140)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:71)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:104)
at org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.close(ChecksumFs.java:364)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:71)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:104)
at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)
at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:254)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:61)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112)
at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2168)
at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2109)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:102)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:977)
Caused by: java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:282)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:236)
... 16 more
2013-11-20 00:00:54,524 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status -1
--------------------------