Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
Description
The existence FileSystemDatasetWriteOptions::basename_template would seem to imply that the dataset writer may write multiple files for a given partition. However, the current implementation will always create one file per directory.
I'm not sure what the desired behavior is here but the two obvious choices are:
* Get rid of FileSystemDatasetWriteOptions::basename_template (or at least the {i} parameter)
* Add an option to limit how many rows/bytes are put in a single file
ARROW-12358 is probably worth mentioning as whatever strategy is come up with here should probably be compatible with supporting append mode in the future.
Attachments
Issue Links
- duplicates
-
ARROW-10439 [C++][Dataset] Add max file size as a dataset writing option
- Open