Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
5.0-alpha
-
None
Description
question:
The files uploaded under the path spark.kubernetes.file.upload.path are not automatically deleted
1: When spark creates a driverPod, it uploads dependencies to the specified path. The build task is in cluster mode and needs to create a driverPod. Running the build task multiple times results in a large path file.
2: At present, the upload.path path we configured (s3a://kylin/spark-on-k8s) is a fixed path, and spark will create a subdirectory in this directory, the spark-upload-uuid directory, and then store the dependencies in it.
dev design
Core idea, add dynamic subdirectory under the original upload.path path, delete the entire subdirectory when the task is over
Build task: upload.path + jobId (e.g. s3a://kylin/spark-on-k8s/uuid)
Delete the dependency directory when the build task is finished
Automatically delete dependent function is called, kill-9 situation will lead to the deletion function is not called, garbage cleaning function needs to be added to the bottom of the policy, such as greater than three months before the directory is automatically deleted