Details
-
Bug
-
Status: Resolved
-
P2
-
Resolution: Fixed
-
None
Description
BatchAndInsertElements accumulates all the input elements and flushes them in finishBundle.
However if there is enough data the request limit for bigquery can be exceeded causing an exception like the following. It seems that finishBundle should limit the # of rows and bytes and possibly flush multiple times for a destination.
Work around would be to use autosharding which uses state that has batching limits or to increase the # of streaming keys to decrease the likelihood of hitting this.
Error while processing a work item: UNKNOWN: org.apache.beam.sdk.util.UserCodeException: java.lang.RuntimeException: com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request POST https://bigquery.googleapis.com/bigquery/v2/projects/google.com:clouddfe/datasets/nexmark_06090820455271/tables/nexmark_simple/insertAll?prettyPrint=false { "code" : 400, "errors" : [ { "domain" : "global", "message" : "Request payload size exceeds the limit: 10485760 bytes.", "reason" : "badRequest" } ], "message" : "Request payload size exceeds the limit: 10485760 bytes.", "status" : "INVALID_ARGUMENT" } at org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:39) at org.apache.beam.sdk.io.gcp.bigquery.BatchedStreamingWrite$BatchAndInsertElements$DoFnInvoker.invokeFinishBundle(Unknown Source) at org.apache.beam.fn.harness.FnApiDoFnRunner.finishBundle(FnApiDoFnRunner.java:1661)