Description
We observed frequent L0 -> L1 compaction during Kafka Streams state recovery. Some sample log:
2018/06/08-00:04:50.892331 7f8a6d7fa700 (Original Log Time 2018/06/08-00:04:50.892298) [db/compaction_picker_universal.cc:270] [default] Universal: sorted runs files(6): files[3 0 0 0 1 1 38] max score 1.00 2018/06/08-00:04:50.892336 7f8a6d7fa700 (Original Log Time 2018/06/08-00:04:50.892300) [db/compaction_picker_universal.cc:655] [default] Universal: First candidate file 134[0] to reduce size amp. 2018/06/08-00:04:50.892338 7f8a6d7fa700 (Original Log Time 2018/06/08-00:04:50.892302) [db/compaction_picker_universal.cc:686] [default] Universal: size amp not needed. newer-files-total-size 13023497 earliest-file-size 2541530372 2018/06/08-00:04:50.892339 7f8a6d7fa700 (Original Log Time 2018/06/08-00:04:50.892303) [db/compaction_picker_universal.cc:473] [default] Universal: Possible candidate file 134[0]. 2018/06/08-00:04:50.892341 7f8a6d7fa700 (Original Log Time 2018/06/08-00:04:50.892304) [db/compaction_picker_universal.cc:525] [default] Universal: Skipping file 134[0] with size 1007 (compensated size 1287) 2018/06/08-00:04:50.892343 7f8a6d7fa700 (Original Log Time 2018/06/08-00:04:50.892306) [db/compaction_picker_universal.cc:473] [default] Universal: Possible candidate file 133[1]. 2018/06/08-00:04:50.892344 7f8a6d7fa700 (Original Log Time 2018/06/08-00:04:50.892307) [db/compaction_picker_universal.cc:525] [default] Universal: Skipping file 133[1] with size 4644 (compensated size 16124) 2018/06/08-00:04:50.892346 7f8a6d7fa700 (Original Log Time 2018/06/08-00:04:50.892307) [db/compaction_picker_universal.cc:473] [default] Universal: Possible candidate file 126[2]. 2018/06/08-00:04:50.892348 7f8a6d7fa700 (Original Log Time 2018/06/08-00:04:50.892308) [db/compaction_picker_universal.cc:525] [default] Universal: Skipping file 126[2] with size 319764 (compensated size 319764) 2018/06/08-00:04:50.892349 7f8a6d7fa700 (Original Log Time 2018/06/08-00:04:50.892309) [db/compaction_picker_universal.cc:473] [default] Universal: Possible candidate level 4[3]. 2018/06/08-00:04:50.892351 7f8a6d7fa700 (Original Log Time 2018/06/08-00:04:50.892310) [db/compaction_picker_universal.cc:525] [default] Universal: Skipping level 4[3] with size 2815574 (compensated size 2815574) 2018/06/08-00:04:50.892352 7f8a6d7fa700 (Original Log Time 2018/06/08-00:04:50.892311) [db/compaction_picker_universal.cc:473] [default] Universal: Possible candidate level 5[4]. 2018/06/08-00:04:50.892357 7f8a6d7fa700 (Original Log Time 2018/06/08-00:04:50.892311) [db/compaction_picker_universal.cc:525] [default] Universal: Skipping level 5[4] with size 9870748 (compensated size 9870748) 2018/06/08-00:04:50.892358 7f8a6d7fa700 (Original Log Time 2018/06/08-00:04:50.892313) [db/compaction_picker_universal.cc:473] [default] Universal: Possible candidate level 6[5].
In customized RocksDBConfigSetter, we set
level0_file_num_compaction_trigger=6
During bulk loading, the following options are set: https://github.com/facebook/rocksdb/blob/master/options/options.cc
Options* Options::PrepareForBulkLoad() { // never slowdown ingest. level0_file_num_compaction_trigger = (1<<30); level0_slowdown_writes_trigger = (1<<30); level0_stop_writes_trigger = (1<<30); soft_pending_compaction_bytes_limit = 0; hard_pending_compaction_bytes_limit = 0; // no auto compactions please. The application should issue a // manual compaction after all data is loaded into L0. disable_auto_compactions = true; // A manual compaction run should pick all files in L0 in // a single compaction run. max_compaction_bytes = (static_cast<uint64_t>(1) << 60); // It is better to have only 2 levels, otherwise a manual // compaction would compact at every possible level, thereby // increasing the total time needed for compactions. num_levels = 2; // Need to allow more write buffers to allow more parallism // of flushes. max_write_buffer_number = 6; min_write_buffer_number_to_merge = 1; // When compaction is disabled, more parallel flush threads can // help with write throughput. max_background_flushes = 4; // Prevent a memtable flush to automatically promote files // to L1. This is helpful so that all files that are // input to the manual compaction are all at L0. max_background_compactions = 2; // The compaction would create large files in L1. target_file_size_base = 256 * 1024 * 1024; return this; }
Especially, those values are set to a very large number to avoid compactions and ensures files are all on L0.
level0_file_num_compaction_trigger = (1<<30); level0_slowdown_writes_trigger = (1<<30); level0_stop_writes_trigger = (1<<30);
However, in RockDBStore.java, openDB code, we first call:
options.prepareForBulkLoad() and then use the configs from the customized customized RocksDBConfigSetter. This may overwrite the configs set in prepareBulkLoad call. The fix is to move prepareBulkLoad call after applying configs customized RocksDBConfigSetter.