Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-35285

Autoscaler key group optimization can interfere with scale-down.max-factor

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • Kubernetes Operator
    • None

    Description

      When setting a less aggressive scale down limit, the key group optimization can prevent a vertex from scaling down at all. It will hunt from target upwards to maxParallelism/2, and will always find currentParallelism again.

       

      A simple test trying to scale down from a parallelism of 60 with a scale-down.max-factor of 0.2:

      assertEquals(48, JobVertexScaler.scale(60, inputShipStrategies, 360, .8, 8, 360)); 

       

      It seems reasonable to make a good attempt to spread data across subtasks, but not at the expense of total deadlock. The problem is that during scale down it doesn't actually ensure that newParallelism will be < currentParallelism. The only workaround is to set a scale down factor large enough such that it finds the next lowest divisor of the maxParallelism.

       

      Clunky, but something to ensure it can make at least some progress. There is another test that now fails, but just to illustrate the point:

      for (int p = newParallelism; p <= maxParallelism / 2 && p <= upperBound; p++) {
          if ((scaleFactor < 1 && p < currentParallelism) || (scaleFactor > 1 && p > currentParallelism)) {
              if (maxParallelism % p == 0) {
                  return p;
              }
          }
      } 

       

      Perhaps this is by design and not a bug, but total failure to scale down in order to keep optimized key groups does not seem ideal.

       

      Key group optimization block:

      https://github.com/apache/flink-kubernetes-operator/blob/fe3d24e4500d6fcaed55250ccc816546886fd1cf/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/JobVertexScaler.java#L296C1-L303C10

      Attachments

        Activity

          People

            Unassigned Unassigned
            trystan Trystan
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: