Details
-
Bug
-
Status: Patch Available
-
Major
-
Resolution: Unresolved
-
3.4.0
-
None
Description
In AbstractYarnScheduler::completeOustandingUpdatesWhichAreReserved(), after getReservedContainer(), there is a certain possibility that the reservedContainer is removed by calling
SchedulerNode::setReservedContainer() asynchronously. It will throw NEP and resourcemanager would crash like below trace log.
// code placeholder 2023-05-07 02:04:38,201 FATAL [SchedulerEventDispatcher:Event Processor] org.apache.hadoop.yarn.event.EventDispatcher: Error in handling event type CONTAINER_EXPIRED to the Event Dispatcher java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completeOustandingUpdatesWhichAreReserved(AbstractYarnScheduler.java:725) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:686) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1927) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:172) at org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:74) at java.lang.Thread.run(Thread.java:748) 2023-05-07 02:04:38,201 INFO [SchedulerEventDispatcher:Event Processor] org.apache.hadoop.yarn.event.EventDispatcher: Exiting, bbye..
Attachments
Issue Links
- links to