Details
-
Improvement
-
Status: Resolved
-
Critical
-
Resolution: Duplicate
-
2.7.0
-
None
-
None
Description
Currently, users of S3AFastOutputStream can control memory usage with a few settings: fs.s3a.threads.core,max, which control the number of active uploads (specifically as arguments to a ThreadPoolExecutor), and fs.s3a.max.total.tasks, which controls the size of the feeding queue for the ThreadPoolExecutor.
However, a user can get an almost guaranteed crash if the throughput of the writing job is higher than the total S3 throughput, because there is never any backpressure or blocking on calls to write.
If fs.s3a.max.total.tasks is set high (the default is 1000), then write calls will continue to add data to the queue, which can eventually OOM. But if the user tries to set it lower, then writes will fail when the queue is full; the ThreadPoolExecutor will reject the part with java.util.concurrent.RejectedExecutionException.
Ideally, calls to write should block, not fail when the queue is full, so as to apply backpressure on whatever the writing process is.
Attachments
Issue Links
- duplicates
-
HADOOP-11684 S3a to use thread pool that blocks clients
- Resolved