Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
3.4.0
-
None
Description
SPARK-3900 fixed the IllegalStateException in cleanupStagingDir in ApplicationMaster's shutdownhook. However, SPARK-21138 accidentally reverted/undid that change when fixing the "Wrong FS" bug. Now, we are seeing SPARK-3900 reported by our users at Linkedin. We need to bring back the fix for SPARK-3900.
The illegalStateException when creating a new filesystem object is due to the limitation in Hadoop that we can not register a shutdownhook during shutdown. So, when a spark job fails during pre-launch, as part of shutdown, cleanupStagingDir would be called. Then, if we attempt to create a new filesystem object for the first time, HDFS would try to register a hook to shutdown KeyProviderCache when creating a ClientContext for DFSClient. As a result, we hit the IllegalStateException. We should avoid the creation of a new filesystem object in cleanupStagingDir() when it is called in a shutdown hook. This was introduced in SPARK-3900. However, SPARK-21138 accidentally reverted/undid that change. We need to bring back that fix to Spark to avoid the IllegalStateException.
Attachments
Issue Links
- is related to
-
SPARK-3900 ApplicationMaster's shutdown hook fails and IllegalStateException is thrown.
- Resolved
-
SPARK-21138 Cannot delete staging dir when the clusters of "spark.yarn.stagingDir" and "spark.hadoop.fs.defaultFS" are different
- Resolved
- links to