Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-7403

BigQueryIO.Write does not autoscale correctly (idle workers)

Details

    • Bug
    • Status: Resolved
    • P2
    • Resolution: Won't Fix
    • None
    • 2.11.0
    • io-java-gcp
    • None

    Description

      Apache Beam version:
      2.10
      JAVA SDK
      Dataflow GCP Staged

      Details:
      We have a streaming dataflow which ingests data into BigQuery (Streaming Inserts).
      We deploy a job with max number of workers = 40 and
      there is a huge backlog already (high watermark).

      When the dataflow starts it scales 0 -> 3 (from 0 to 3 workers)
      and starts ingesting with 12000 messages/sec rate.
      After 2 mins it scales 3 -> 40 to keep up with a backlog.
      After scaling up, the rate never goes higher than it was with 3 nodes (12000 messages/sec).

      We have memory consumption metrics in Stackdriver; from them
      we see that the first 3 workers consume about 5GB of RAM and the rest 37 workers
      consume about 0.2GB RAM. It appears that these autoscaled Nodes are idle? Importantly, they don’t add to Streaming Inserts process for BigQuery.

      Autoscaling in the other streaming pipelines we have works fine.
      It appears that this is related to BigQuery streaming inserts.

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            pashashiz Pavlo Pohrrebnyi
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: