Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
2.1.1
-
None
Description
I am running Spark v2.1.1 in 'standalone' mode (no yarn/mesos) across EC2s. I have 1 master ec2 that acts as the driver (since spark-submit is called on this host), spark.master is setup, deploymode is client (so sparksubmit only returns a ReturnCode to the putty window once it finishes processing). I have 1 worker ec2 that is registered with the spark master. When i run sparksubmit on the master, I can see in the WebUI that executors starting on the worker and I can verify successful completion. However if while the sparksubmit is running and the worker ec2 gets terminated and then new ec2 worker becomes alive 3mins later and registers with the master, I have noticed on the webui that it shows 'cannot find address' in the executor status but the driver keeps waiting forever (2 days later I kill it) or in some cases the driver allocates tasks to the new worker only 5 hours later and then completes! Is there some setting i am missing that would explain this behavior?
Attachments
Issue Links
- is related to
-
SPARK-32197 'Spark driver' stays running even though 'spark application' has FAILED
- Open