Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23660

Yarn throws exception in cluster mode when the application is small

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.3.0
    • 2.3.1, 2.4.0
    • Spark Core, YARN
    • None

    Description

      Yarn throws the following exception in cluster mode when the application is really small:

      18/03/07 23:34:22 WARN netty.NettyRpcEnv: Ignored failure: java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@7c974942 rejected from java.util.concurrent.ScheduledThreadPoolExecutor@1eea9d2d[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
      18/03/07 23:34:22 ERROR yarn.ApplicationMaster: Uncaught exception: 
      org.apache.spark.SparkException: Exception thrown in awaitResult: 
      	at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
      	at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
      	at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)
      	at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:76)
      	at org.apache.spark.deploy.yarn.YarnAllocator.<init>(YarnAllocator.scala:102)
      	at org.apache.spark.deploy.yarn.YarnRMClient.register(YarnRMClient.scala:77)
      	at org.apache.spark.deploy.yarn.ApplicationMaster.registerAM(ApplicationMaster.scala:450)
      	at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:493)
      	at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:345)
      	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260)
      	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
      	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
      	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:810)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:422)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
      	at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:809)
      	at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:259)
      	at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:834)
      	at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
      Caused by: org.apache.spark.rpc.RpcEnvStoppedException: RpcEnv already stopped.
      	at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:158)
      	at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:135)
      	at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:229)
      	at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:523)
      	at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:91)
      	... 17 more
      18/03/07 23:34:22 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 13, (reason: Uncaught exception: org.apache.spark.SparkException: Exception thrown in awaitResult: )
      

      Example application:

      object ExampleApp {
        def main(args: Array[String]): Unit = {
          val conf = new SparkConf().setAppName("ExampleApp")
          val sc = new SparkContext(conf)
          try {
            // Do nothing
          } finally {
            sc.stop()
          }
        }
      

      Attachments

        Activity

          People

            gsomogyi Gabor Somogyi
            gsomogyi Gabor Somogyi
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: