Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-14332

Improve the workflow of cluster management for Flink on Dataproc

Details

    • Improvement
    • Status: Resolved
    • P2
    • Resolution: Fixed
    • None
    • 2.40.0
    • runner-py-interactive
    • None

    Description

      Improve the workflow of cluster management.

      There is an option to configure a default cluster name. The existing user flows are:

      1. Use the default cluster name to create a new cluster if none is in use;
      2. Reuse a created cluster that has the default cluster name;
      3. If the default cluster name is configured to a new value, re-apply 1 and 2.

      A better solution is to 

      1. Create a new cluster implicitly if there is none or explicitly if the user wants one with specific provisioning;
      2. Always default to using the last created cluster.

      The reasons are:

      • Cluster name is meaningless to the user when a cluster is just a medium to run OSS runners (as applications) such as Flink or Spark. The cluster could also be running anywhere (on GCP) such as Dataproc, k8s, or even Dataflow itself.
      • Clusters should be uniquely identified, thus should always have a distinct name. Clusters are managed (created/reused/deleted) behind the scenes by the notebook runtime when the user doesn’t explicitly do so (the capability to explicitly manage clusters is still available). Reusing the same default cluster name is risky when a cluster is deleted by one notebook runtime while another cluster with the same name is created by a different notebook runtime. 
      • Provide the capability for the user to explicitly provision a cluster.

      Current implementation provisions each cluster at the location specified by GoogleCloudOptions using 3 worker nodes. There is no explicit API to configure the number or shape of workers.

      We could use the WorkerOptions to allow customers to explicitly provision a cluster and expose an explicit API (with UX in notebook extension) for customers to change the size of a cluster connected with their notebook (until we have an auto scaling solution with Dataproc for Flink).

      Attachments

        Issue Links

          Activity

            People

              ningk Ning
              ningk Ning
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m