Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-4027

DagAwareYarnTaskScheduler can miscompute blocked vertices and cause a hang

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.9.1, 0.10.0
    • 0.9.2, 0.10.0
    • None
    • None
    • Reviewed

    Description

      In a scenario where there are retro active failures and the YARN queue is full to not allow more new container assignments, the scheduler can miscompute blocked vertex set as it tries to flip the bits upto the length of the bitset which may not be reflective of the total number of vertices. This causes no preemption and the DAG will hang.

      @GuardedBy("DagAwareYarnTaskScheduler.this")
          BitSet createVertexBlockedSet() {
            BitSet blocked = new BitSet();
            Entry<Priority, RequestPriorityStats> entry = priorityStats.lastEntry();
            if (entry != null) {
              RequestPriorityStats stats = entry.getValue();
              blocked.or(stats.allowedVertices);
              blocked.flip(0, blocked.length());
              blocked.or(stats.descendants);
            }
            return blocked;
          }
      

      Attachments

        1. TEZ-4027.001.patch
          7 kB
          Kuhu Shukla
        2. TEZ-4027.002.patch
          7 kB
          Kuhu Shukla

        Activity

          People

            kshukla Kuhu Shukla
            kshukla Kuhu Shukla
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: