Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-4139

Tez should consider node information for computing failure fraction - downstream(reducer) problems

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      When lots of downstream attempts fail to pull the information from source task, source task is marked as failed and it is retried. Currently failure fraction is handled by looking at unique task attempts from downstream. However, it should consider taking into account node information for computing "failureFraction".

      https://github.com/apache/tez/blob/master/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskAttemptImpl.java#L1845-L1849

      Attachments

        1. TEZ-4139.02.WIP.patch
          19 kB
          László Bodor
        2. TEZ-4139.01.WIP.patch
          5 kB
          László Bodor

        Issue Links

          Activity

            People

              abstractdog László Bodor
              rajesh.balamohan Rajesh Balamohan
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: