Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
JobVertexScaler#scale has a optimization: Try to adjust the parallelism such that it divides the number of key groups without a remainder => data is evenly spread across subtasks.
It's only useful when the upstream shuffle type has keyBy. We should avoid this optimization when the upstream shuffle type doesn't have keyBy.
Attachments
Issue Links
- links to