Details
Description
S3A Committer is being bottle-necked on the driver when finalizing the commits. It seems like only a single thread is being used to finalize the commit. In the experiment we are saving 36,000 files ending committing for almost 2 hours each file taking 0.1 - 0.5 seconds while all the executors stay idle while the driver commits. I have attached the driver log snippets to support this theory in comparison to spark 3.4.0.
The most likely reason is the usage of [ThreadPoolExecutor](https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ThreadPoolExecutor.html) in
https://github.com/apache/hadoop/blob/706d88266abcee09ed78fbaa0ad5f74d818ab0e9/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/impl/CommitContext.java#L239-L244
where an LinkedBlockingQueue is used with a corePoolSize of 0 which ends up with a single thread and no new threads created.
From the documentation
Unbounded queues. Using an unbounded queue (for example a LinkedBlockingQueue without a predefined capacity) will cause new tasks to wait in the queue when all corePoolSize threads are busy. Thus, no more than corePoolSize threads will ever be created. (And the value of the maximumPoolSize therefore doesn't have any effect.) This may be appropriate when each task is completely independent of others, so tasks cannot affect each others execution; for example, in a web page server. While this style of queuing can be useful in smoothing out transient bursts of requests, it admits the possibility of unbounded work queue growth when commands continue to arrive on average faster than they can be processed.
Magic Committer spark 3.5.0-SNAPSHOT
2023-05-26 15:35:04,852 DEBUG impl.CommitContext: creating thread pool of size 32 2023-05-26 15:35:04,922 INFO yarn.YarnAllocator: Driver requested a total number of 0 executor(s) for resource profile id: 0. 2023-05-26 15:35:07,910 INFO commit.AbstractS3ACommitter: Starting: committing the output of 36000 task(s) 2023-05-26 15:35:07,914 INFO commit.AbstractS3ACommitter: Starting: Loading and committing files in pendingset s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448 e-8022-31e0f7457354/00/task_202305261454174742717183892533843_0031_m_000000.pendingset 2023-05-26 15:35:07,941 DEBUG files.PersistentCommitData: Reading commit data from file s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448e-8022-31e0f7457354/00/ task_202305261454174742717183892533843_0031_m_000000.pendingset 2023-05-26 15:35:08,036 DEBUG impl.CommitContext: creating thread pool of size 32 2023-05-26 15:35:08,037 DEBUG impl.CommitOperations: Committing single commit SinglePendingCommit {version=2, uri='s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5- 448e-8022-31e0f7457354/00/tasks/attempt_202305261454174742717183892533843_0031_m_000000_1920855/__base/rawRequestType=imp_dsp/part-00000-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet', destination='<PATH>/date=20230520/rawRequestType=imp_dsp/part-00000-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet', uploadId='SCp78J9aYOmnPwrtfd5a.Q6B9Zu6olQw3eZcpyt.W.BCu6.M6fz54nlTe2ZYUDicSskFgPocsjlVXAbfiPn3Xu 26MzXNdWcg5j_kBGg9iWpeoWh4K21gt7bbKetML95MXAck15yP.VGFeOleoAspUg--', created=1685113555232, saved=1685113555232, size=110636173, date='Fri May 26 15:05:55 UTC 2023', jobId='b9167b1a-35a5-448e-8022-31e0f7457354', taskId='attemp t_202305261454174742717183892533843_0031_m_000000_1920855', notes='', etags=[64b9d14a89890cf7f931ff05074f6b6a,7a8c22d95d85736806205ad0f84094a5,82f55a1b0595b38420c6b55cb2763a46,212a3105afcf4a9045730f885a622e0a,b563ec91932a66498 e8e9d717b625961,0425b7ae17a8a2af80734f0adf4debe9,496e9fbc08a583657fd29779dcd848d6,72bff33d090699baacb6a6171760a675,b2c6b9b76256416d304392bfbabca382,1a8f63d960cf50589b83b407ddfc77c7,d79cf49f16cd61fb00faf67034268529,e3b7ea2bd0ff f2e20a89eccd0ebbeb4a,8edc254a94902fee2884350b4a89f223,f0858853a4542b562c6973e28ee40c19]} 2023-05-26 15:35:08,037 INFO impl.CommitOperations: Starting: Committing file <PATH>/date=20230520/rawRequestType=imp_dsp/part-00000-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet s ize 110636173 2023-05-26 15:35:08,198 DEBUG impl.CommitOperations: Successful commit of file length 110636173 2023-05-26 15:35:08,199 INFO impl.CommitOperations: Committing file <PATH>/date=20230520/rawRequestType=imp_dsp/part-00000-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet size 110636 173: duration 0:00.162s 2023-05-26 15:35:08,208 INFO commit.AbstractS3ACommitter: Loading and committing files in pendingset s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448e-8022-31e 0f7457354/00/task_202305261454174742717183892533843_0031_m_000000.pendingset: duration 0:00.294s 2023-05-26 15:35:08,208 INFO commit.AbstractS3ACommitter: Starting: Loading and committing files in pendingset s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448 e-8022-31e0f7457354/00/task_202305261454174742717183892533843_0031_m_000001.pendingset 2023-05-26 15:35:08,208 DEBUG files.PersistentCommitData: Reading commit data from file s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448e-8022-31e0f7457354/00/ task_202305261454174742717183892533843_0031_m_000001.pendingset 2023-05-26 15:35:08,232 DEBUG impl.CommitOperations: Committing single commit SinglePendingCommit {version=2, uri='s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5- 448e-8022-31e0f7457354/00/tasks/attempt_202305261454174742717183892533843_0031_m_000001_1920856/__base/rawRequestType=imp_dsp/part-00001-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet', destination='<PATH>/date=20230520/rawRequestType=imp_dsp/part-00001-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet', uploadId='I.5Kns.n5JKLbupr5.AWbzApL9YDGhQyI438WQT3JXUjevkQS8u1aQwQhkVbwBoxnFXItCbKJCS8EwPdtoijtc UFrjU0eU3pNvt7lwcYRJ0iklLwXexHbYMpZq3pDj0udwsHCq0trnqyq2Ee.o.prw--', created=1685113548923, saved=1685113548923, size=110570244, date='Fri May 26 15:05:48 UTC 2023', jobId='b9167b1a-35a5-448e-8022-31e0f7457354', taskId='attemp t_202305261454174742717183892533843_0031_m_000001_1920856', notes='', etags=[835dd18d4ce995c22c1a84de5cd26fd8,c1f2acd3360b2f94ce90ef0d80edd7e5,208787fce146d917dbc50b1cf4adff4c,a66b228379201a0a278061b096dc2faf,58134b7ae228f0d2a 6879ec53a3f35c1,e847a7614639d3f1a43a1b3090b603e1,2d8eac2399238e6d356fe0bdfafd076e,e0c3a964ee2a3c270f67e7e4a7791440,84b05192e896e41e1d16aa363f0eb48b,c3cf764e4361da4573b8dfa6361e4174,78b145f682946c7e18750758da3578cd,d9cd438899d2 0522668b9cd7f61cc098,e8b3944929f17d927ed7aef31800bdbf,52d279a75e9382ead94fe4d9b81c3bf9]} 2023-05-26 15:35:08,232 INFO impl.CommitOperations: Starting: Committing file <PATH>/date=20230520/rawRequestType=imp_dsp/part-00001-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet s ize 110570244 2023-05-26 15:35:08,438 DEBUG impl.CommitOperations: Successful commit of file length 110570244 2023-05-26 15:35:08,438 INFO impl.CommitOperations: Committing file <PATH>/date=20230520/rawRequestType=imp_dsp/part-00001-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet size 110570 244: duration 0:00.206s 2023-05-26 15:35:08,448 INFO commit.AbstractS3ACommitter: Loading and committing files in pendingset s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448e-8022-31e 0f7457354/00/task_202305261454174742717183892533843_0031_m_000001.pendingset: duration 0:00.240s 2023-05-26 15:35:08,448 INFO commit.AbstractS3ACommitter: Starting: Loading and committing files in pendingset s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448 e-8022-31e0f7457354/00/task_202305261454174742717183892533843_0031_m_000002.pendingset 2023-05-26 15:35:08,448 DEBUG files.PersistentCommitData: Reading commit data from file s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448e-8022-31e0f7457354/00/ task_202305261454174742717183892533843_0031_m_000002.pendingset 2023-05-26 15:35:08,489 DEBUG impl.CommitOperations: Committing single commit SinglePendingCommit {version=2, uri='s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5- 448e-8022-31e0f7457354/00/tasks/attempt_202305261454174742717183892533843_0031_m_000002_1920857/__base/rawRequestType=imp_dsp/part-00002-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet', destination='<PATH>/date=20230520/rawRequestType=imp_dsp/part-00002-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet', uploadId='AB982jhG0EAfnSxXWphX7R.QfqrkS2ipP6APqJZYGlqDl335WX4X2xn81wy7l2i5NBQrvG.eVcQgYFudk51wi5 EMDJRNVz__SEWyF0xTncpIHpaC5xn8La.AY8gzCHQrAfRJ83nErQhe4Idmhcs04w--', created=1685113553072, saved=1685113553072, size=110136058, date='Fri May 26 15:05:53 UTC 2023', jobId='b9167b1a-35a5-448e-8022-31e0f7457354', taskId='attemp t_202305261454174742717183892533843_0031_m_000002_1920857', notes='', etags=[f043628528fc9bdd6db4107fd175eb1d,11e36b4bde35bdb1d2e160ffd5fcc3f6,5105cb3ebee1b27cfec66f7682679316,1a85e5c9477e855e38f7bd4e81bed28e,7c57360865e12c19b 3125cbfb8c786f8,59d928629e7afff4c4c038a1dadd0ce7,bfcedee2f51b45c20a097fbbae85a48f,7e3ff2665714d8cdc6ec20134856eb4c,0117e5c47f94a9ff4d8d0eafd1f6b76e,3c4f27e69544e4cc5297c56413231639,a32cd6422aaae63bdcbaafa7400ab889,1d5d07f0a0b2 58773993a456bc7ec7ee,a220e786aa36f643edc330a184e23d88,e19a85ab3accaa4dc697e7dfbf5d5325]} 2023-05-26 15:35:08,489 INFO impl.CommitOperations: Starting: Committing file <PATH>/date=20230520/rawRequestType=imp_dsp/part-00002-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet s ize 110136058 2023-05-26 15:35:08,646 DEBUG impl.CommitOperations: Successful commit of file length 110136058 2023-05-26 15:35:08,646 INFO impl.CommitOperations: Committing file <PATH>/date=20230520/rawRequestType=imp_dsp/part-00002-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet size 110136058: duration 0:00.157s
Magic Committer spark 3.4.0
2023-05-24 09:47:19,906 INFO yarn.YarnAllocator: Driver requested a total number of 524 executor(s) for resource profile id: 0. 2023-05-24 09:47:23,064 INFO commit.AbstractS3ACommitter: Starting: committing the output of 36000 task(s) 2023-05-24 09:47:23,069 DEBUG commit.AbstractS3ACommitter: Task committer attempt_202305240933298436217226765687147_0000_m_000000_0: creating thread pool of size 32 2023-05-24 09:47:23,074 DEBUG commit.Tasks: Executing task 2023-05-24 09:47:23,074 INFO commit.AbstractS3ACommitter: Starting: Loading and committing files in pendingset s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000000.pendingset 2023-05-24 09:47:23,075 DEBUG commit.Tasks: Executing task 2023-05-24 09:47:23,075 INFO commit.AbstractS3ACommitter: Starting: Loading and committing files in pendingset s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000001.pendingset 2023-05-24 09:47:23,075 DEBUG commit.Tasks: Executing task 2023-05-24 09:47:23,075 INFO commit.AbstractS3ACommitter: Starting: Loading and committing files in pendingset s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000002.pendingset 2023-05-24 09:47:23,075 DEBUG commit.Tasks: Executing task 2023-05-24 09:47:23,075 INFO commit.AbstractS3ACommitter: Starting: Loading and committing files in pendingset s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000003.pendingset 2023-05-24 09:47:23,075 DEBUG commit.Tasks: Executing task 2023-05-24 09:47:23,075 INFO commit.AbstractS3ACommitter: Starting: Loading and committing files in pendingset s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000004.pendingset 2023-05-24 09:47:23,076 DEBUG commit.Tasks: Executing task 2023-05-24 09:47:23,076 INFO commit.AbstractS3ACommitter: Starting: Loading and committing files in pendingset s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000005.pendingset 2023-05-24 09:47:23,076 DEBUG commit.Tasks: Executing task 2023-05-24 09:47:23,076 INFO commit.AbstractS3ACommitter: Starting: Loading and committing files in pendingset s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000006.pendingset 2023-05-24 09:47:23,076 DEBUG commit.Tasks: Executing task 2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in file s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7a e3c5785060/task_20230524093329924571364925494992_0031_m_000004.pendingset 2023-05-24 09:47:23,076 INFO commit.AbstractS3ACommitter: Starting: Loading and committing files in pendingset s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000007.pendingset 2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in file s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7a e3c5785060/task_20230524093329924571364925494992_0031_m_000003.pendingset 2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in file s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7a e3c5785060/task_20230524093329924571364925494992_0031_m_000000.pendingset 2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in file s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7a e3c5785060/task_20230524093329924571364925494992_0031_m_000002.pendingset 2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in file s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7a e3c5785060/task_20230524093329924571364925494992_0031_m_000001.pendingset 2023-05-24 09:47:23,078 DEBUG commit.Tasks: Executing task 2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in file s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000005.pendingset 2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in file s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000007.pendingset
Attachments
Issue Links
- is caused by
-
HADOOP-17833 Improve Magic Committer Performance
- Resolved
- is related to
-
SPARK-44678 Downgrade Hadoop to 3.3.4
- Resolved
- links to