Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.3.5
-
None
-
None
Description
we can improve stats collected in the s3a committer and saved to the JSON.
key ones
- of task manifests read; duration of loads
- size of each manifest
I think we would also benefit if we could set the commit thread pools to be big -but then shared across all jobs (i.e. demand-created thread pool in s3a fs). that would allow for a pool size of say, 500, but still support many jobs actively committing at same time (busy spark driver)
finally: should file commit pool size be > size of pool of manifest readers. I think it could be, but the ratio should be fairly low.
Attachments
Issue Links
- is related to
-
MAPREDUCE-7435 ManifestCommitter OOM on azure job
- Resolved