Details
-
Epic
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
None
-
None
-
0
-
Improve data locality during ingestion
Description
Today the upsert partitioner does the file sizing/bin-packing etc for
inserts and then sends some inserts over to existing file groups to
maintain file size.
We can abstract all of this into strategies and some kind of pipeline
abstractions and have it also consider "affinity" to an existing file group
based
on say information stored in the metadata table?
See http://mail-archives.apache.org/mod_mbox/hudi-dev/202102.mbox/browser
for more details