Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-1893

Verify invalid -1 parallelism in DAG.verify()

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.5.4
    • None
    • None

    Description

                throw new TezUncheckedException(vertex.getLogIdentifier() +
                " has -1 tasks but does not have input initializers, " +
                "1-1 uninited sources or custom vertex manager to set it at runtime");
      

      IMO, for this kind of verification we could do it in client side (DAG.verify)

      The following are the message on the client side, the reason that Client could not get the real status of DAG is that Tez AM is killed due to this vertex init error

      19:25:33,716 - Thread( main) - (RMProxy.java:98) - Connecting to ResourceManager at /0.0.0.0:8032
      19:25:33,717 - Thread( main) - (AHSProxy.java:42) - Connecting to Application History server at /0.0.0.0:10200
      19:25:34,724 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
      19:25:35,725 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
      19:25:36,726 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
      19:25:36,846 - Thread( main) - (DAGClientImpl.java:463) - DAG initialized: CurrentState=Running
      19:25:38,351 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
      19:25:39,352 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
      19:25:40,354 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
      19:25:41,356 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
      19:25:42,357 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
      19:25:43,358 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
      19:25:44,359 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
      19:25:45,360 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
      19:25:46,361 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
      19:25:47,362 - Thread( main) - (Client.java:858) - Retrying connect to server: localhost/127.0.0.1:6000. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
      19:25:47,369 - Thread( main) - (DAGClientImpl.java:463) - DAG completed. FinalState=FAILED
      19:25:47,369 - Thread( main) - (TezWordCount.java:203) - status=FAILED, progress=null, diagnostics=Session stats:submittedDAGs=0, successfulDAGs=0, failedDAGs=0, killedDAGs=0
      , counters=null
      19:25:47,372 - Thread( main) - (TezClient.java:470) - Shutting down Tez Session, sessionName=commonName, applicationId=application_1420335690331_0007
      19:25:47,374 - Thread( main) - (TezClientUtils.java:838) - Application not running, applicationId=application_1420335690331_0007, yarnApplicationState=FINISHED, finalApplicationStatus=FAILED, trackingUrl=http://localhost:8088/proxy/application_1420335690331_0007/A, diagnostics=Session stats:submittedDAGs=0, successfulDAGs=0, failedDAGs=0, killedDAGs=0
      
      19:25:47,375 - Thread( main) - (TezClient.java:484) - Failed to shutdown Tez Session via proxy
      org.apache.tez.dag.api.SessionNotRunning: Application not running, applicationId=application_1420335690331_0007, yarnApplicationState=FINISHED, finalApplicationStatus=FAILED, trackingUrl=http://localhost:8088/proxy/application_1420335690331_0007/A, diagnostics=Session stats:submittedDAGs=0, successfulDAGs=0, failedDAGs=0, killedDAGs=0
      
      	at org.apache.tez.client.TezClientUtils.getSessionAMProxy(TezClientUtils.java:839)
      	at org.apache.tez.client.TezClient.getSessionAMProxy(TezClient.java:669)
      	at org.apache.tez.client.TezClient.stop(TezClient.java:476)
      	at com.zjffdu.tez.tutorial.TezWordCount.main(TezWordCount.java:204)
      19:25:47,377 - Thread( main) - (TezClient.java:489) - Could not connect to AM, killing session via YARN, sessionName=commonName, applicationId=application_1420335690331_0007
      19:25:47,381 - Thread( main) - (YarnClientImpl.java:364) - Killed application application_1420335690331_0007
      

      Attachments

        1. TEZ-1893-1.patch
          7 kB
          Jeff Zhang
        2. TEZ-1893-2.patch
          9 kB
          Jeff Zhang

        Activity

          People

            zjffdu Jeff Zhang
            zjffdu Jeff Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: