Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Clusters have busier and quieter periods, so by default Kudu leverages the latter to schedule compactions because during the former it's mostly flushing.
A further improvement would be to somehow recognize that a tserver is mostly scheduling DRS compactions and to start giving them bigger and bigger budgets. Compacting more DRSes at a time lowers the overall write amplification, by running the risk of compacting for too long and not be able to schedule important flushes. We could lower the risk by re-adding an emergency flush thread, and/or making it possible to cancel tasks.