[CASSANDRA-18945] Unified Compaction Strategy is creating too many sstables - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: 5.0-beta1, 5.1
Component/s: Local/Compaction
Labels:
None

Bug Category:
Degradation - Performance Bug/Regression
Severity:
Normal
Complexity:
Normal
Discovered By:
Adhoc Test
Platform:

All
Impacts:

None
Since Version:

5.0-alpha1
Source Control Link:

https://github.com/apache/cassandra/commit/fb806d51e3f5f52bebf9e16e0aad3f98932d962b
Test and Documentation Plan:

Hide

ci

Show
ci

Description

The unified compaction strategy currently aims to create sstables with close to the same size, defaulting to 1 GiB. Unfortunately tests show that Cassandra starts to have performance problems when the number of sstables grows to the order of a thousand, and in particular that even 1 TiB of data with the default configuration is creating too many sstables for efficient processing. This matters even more for SAI, where the number of sstables in the system can have a proportional effect on the complexity of operations.

It is quite easy to create a configuration option that allows sstables to take some part of the data growth by adding a multiplier to the shard count calculation formula, replacing
2 ^ round(log2(d / (t * b))) * b
with
2 ^ round((1 - 𝜆) * log2(d / (t * b))) * b,
where 𝜆 is a parameter whose value is between 0 and 1.

With this, a 𝜆 of 0.5 would mean that shard count and sstable size grow in parallel at the square root of the data size growth. 0 would result in no growth, and 1 in always using the same number of shards.

It may also be valuable to introduce a threshold for engaging the base shard count to avoid splitting lowest-level sstables into fragments that are too small.

Once both of these are in place, we can set defaults that better suit all node densities, including 10 TiB and beyond, for example:

target size of 1 GiB
𝜆 of 1/3
base shard count of 4
minimum size 100 MiB

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

key-value-oss.html
25/Oct/23 10:11
9.92 MB
Branimir Lambov
file_ucs_shenandoah.html
25/Oct/23 10:34
1.89 MB
Stefan Miklosovic
file_ucs_shenandoah_on_heap_memtable_3.html
25/Oct/23 10:34
1.08 MB
Stefan Miklosovic
file_ucs_shenandoah_on_heap_memtable_2.html
25/Oct/23 10:34
1.12 MB
Stefan Miklosovic
file_ucs_shenandoah_off_heap_memtable.html
25/Oct/23 10:34
1.11 MB
Stefan Miklosovic
file_ucs_shenandoah_3.html
25/Oct/23 10:34
1.10 MB
Stefan Miklosovic

Issue Links

is caused by

CASSANDRA-18397 CEP-26: Unified Compaction Strategy

Resolved

relates to

CASSANDRA-18232 Write docs for CEP-26 Unified Compaction Strategy (UCS)

Resolved

links to

GitHub Pull Request #2836

Activity

People

Assignee:: Ethan Brown

Reporter:: Branimir Lambov

Authors:: Ethan Brown

Reviewers:: Branimir Lambov, Stefan Miklosovic

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 20/Oct/23 12:21

Updated:: 14/Nov/23 20:54

Resolved:: 14/Nov/23 20:54

Time Tracking

Estimated:

Not Specified

Remaining:

Logged: