Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
Description
When setting a less aggressive scale down limit, the key group optimization can prevent a vertex from scaling down at all. It will hunt from target upwards to maxParallelism/2, and will always find currentParallelism again.
A simple test trying to scale down from a parallelism of 60 with a scale-down.max-factor of 0.2:
assertEquals(48, JobVertexScaler.scale(60, inputShipStrategies, 360, .8, 8, 360));
It seems reasonable to make a good attempt to spread data across subtasks, but not at the expense of total deadlock. The problem is that during scale down it doesn't actually ensure that newParallelism will be < currentParallelism. The only workaround is to set a scale down factor large enough such that it finds the next lowest divisor of the maxParallelism.
Clunky, but something to ensure it can make at least some progress. There is another test that now fails, but just to illustrate the point:
for (int p = newParallelism; p <= maxParallelism / 2 && p <= upperBound; p++) { if ((scaleFactor < 1 && p < currentParallelism) || (scaleFactor > 1 && p > currentParallelism)) { if (maxParallelism % p == 0) { return p; } } }
Perhaps this is by design and not a bug, but total failure to scale down in order to keep optimized key groups does not seem ideal.
Key group optimization block: