Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
Description
Currently in HoS, after starting a remote process in SparkClientImpl, it will wait for the process to connect back. However, there are cases that the process may fail and exit with error code, and thus no connection is attempted. In this situation, the HS2 process will still wait for the connection and eventually timeout itself. What makes it worse, user may need to wait for two timeout periods, one for SparkSetReducerParallelism, and another for the actual Spark job.
We should cancel the timeout task and mark the promise as failed once we know that the process is failed.