Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-18945

Unified Compaction Strategy is creating too many sstables

    XMLWordPrintableJSON

Details

    Description

      The unified compaction strategy currently aims to create sstables with close to the same size, defaulting to 1 GiB. Unfortunately tests show that Cassandra starts to have performance problems when the number of sstables grows to the order of a thousand, and in particular that even 1 TiB of data with the default configuration is creating too many sstables for efficient processing. This matters even more for SAI, where the number of sstables in the system can have a proportional effect on the complexity of operations.

      It is quite easy to create a configuration option that allows sstables to take some part of the data growth by adding a multiplier to the shard count calculation formula, replacing
      2 ^ round(log2(d / (t * b))) * b
      with
      2 ^ round((1 - 𝜆) * log2(d / (t * b))) * b,
      where 𝜆 is a parameter whose value is between 0 and 1.

      With this, a 𝜆 of 0.5 would mean that shard count and sstable size grow in parallel at the square root of the data size growth. 0 would result in no growth, and 1 in always using the same number of shards.

      It may also be valuable to introduce a threshold for engaging the base shard count to avoid splitting lowest-level sstables into fragments that are too small.

      Once both of these are in place, we can set defaults that better suit all node densities, including 10 TiB and beyond, for example:

      • target size of 1 GiB
      • 𝜆 of 1/3
      • base shard count of 4
      • minimum size 100 MiB

      Attachments

        1. key-value-oss.html
          9.92 MB
          Branimir Lambov
        2. file_ucs_shenandoah.html
          1.89 MB
          Stefan Miklosovic
        3. file_ucs_shenandoah_on_heap_memtable_3.html
          1.08 MB
          Stefan Miklosovic
        4. file_ucs_shenandoah_on_heap_memtable_2.html
          1.12 MB
          Stefan Miklosovic
        5. file_ucs_shenandoah_off_heap_memtable.html
          1.11 MB
          Stefan Miklosovic
        6. file_ucs_shenandoah_3.html
          1.10 MB
          Stefan Miklosovic

        Issue Links

          Activity

            People

              ethan.brown Ethan Brown
              blambov Branimir Lambov
              Ethan Brown
              Branimir Lambov, Stefan Miklosovic
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h
                  2h