Details
-
New Feature
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
The current compaction strategies are based on the logfile size, the number of logfile files, etc. The data time of the RO table generated by these strategies is uncontrollable. Hudi also has a DayBased strategy, but it relies on day based partition path and the time granularity is coarse.
The EventTimeBasedCompactionStrategy strategy can generate event time-friendly RO tables, whether it is day based partition or not. For example, the strategy can select all logfiles whose data time is before 3 am for compaction, so that the generated RO table data is before 3 am. If we just want to query data before 3 am, we can just query the RO table which is much faster.
With the strategy, I think we can expand the application scenarios of RO tables.
Attachments
Issue Links
- links to