Details
-
Improvement
-
Status: Resolved
-
P3
-
Resolution: Fixed
-
2.1.0
Description
Right now Sample.any converts the collection to an iterable view and take first n in a side input. This may require materializing the entire collection to disk and is potentially inefficient.
https://github.com/apache/beam/blob/v2.1.0/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Sample.java#L74
It can be fixed by applying a truncating `DoFn` first, then a combine into `List<T>` which limits the list size, and finally flattening the list.
Attachments
Issue Links
- links to