Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
3.0.0
-
None
Description
Restarting a NM, I found the active RM crash. The Exception stack is as follows.
2019-12-16 18:12:20,286 FATAL org.apache.hadoop.yarn.event.EventDispatcher: Error in handling event type NODE_UPDATE to the Event Dispatcher java.util.NoSuchElementException at java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:2036) at java.util.concurrent.ConcurrentSkipListSet.first(ConcurrentSkipListSet.java:396) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.getNextPendingAsk(AppSchedulingInfo.java:373) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.isOverAMShareLimit(FSAppAttempt.java:941) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:1374) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:345) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:204) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1091) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:958) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1180) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:130) at org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) at java.lang.Thread.run(Thread.java:748)
This issue looks a bit same as YARN-9552,YARN-7382. But the root cause is different.