Details
-
Bug
-
Status: Open
-
P3
-
Resolution: Unresolved
-
None
-
None
-
None
Description
When given a concrete `FileSink`, `WriteToFiles` will re-use the same sink across windows:
- https://github.com/apache/beam/blob/e92d184abc79fe84c48de3dfd9dd168d9b38feac/sdks/python/apache_beam/io/fileio.py#L461
- https://github.com/apache/beam/blob/e92d184abc79fe84c48de3dfd9dd168d9b38feac/sdks/python/apache_beam/io/fileio.py#L625
This can lead to data for one window being written to the sink for another window.
See discussion: https://github.com/apache/beam/pull/14374#discussion_r604320333
Attachments
Issue Links
- causes
-
BEAM-12071 DataFrame IO sinks do not correctly partition by window
- Triage Needed