Details
Description
In a scenario where there are retro active failures and the YARN queue is full to not allow more new container assignments, the scheduler can miscompute blocked vertex set as it tries to flip the bits upto the length of the bitset which may not be reflective of the total number of vertices. This causes no preemption and the DAG will hang.
@GuardedBy("DagAwareYarnTaskScheduler.this") BitSet createVertexBlockedSet() { BitSet blocked = new BitSet(); Entry<Priority, RequestPriorityStats> entry = priorityStats.lastEntry(); if (entry != null) { RequestPriorityStats stats = entry.getValue(); blocked.or(stats.allowedVertices); blocked.flip(0, blocked.length()); blocked.or(stats.descendants); } return blocked; }