Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-14281

Improve LatencyMetrics performance by reducing write path processing

    XMLWordPrintableJSON

Details

    Description

      Currently for each write/read/rangequery/CAS touching the CFS we write a latency metric which takes a lot of processing time (up to 66% of the total processing time if the update was empty). 

      The way latencies are recorded is to use both a dropwizard "Timer" as well as "Counter". Latter is used for totalLatency and the previous is decaying metric for rates and certain percentile metrics. We then replicate all of these CFS writes to the KeyspaceMetrics and globalWriteLatencies. 

      Instead of doing this on the write phase we should merge the metrics when they're read. This is much less common occurrence and thus we save a lot of CPU time in total. This also speeds up the write path.

      Currently, the DecayingEstimatedHistogramReservoir acquires a lock for each update operation, which causes a contention if there are more than one thread updating the histogram. This impacts scalability when using larger machines. We should make it lock-free as much as possible and also avoid a single CAS-update from blocking all the concurrent threads from making an update.

      Attachments

        1. bench.png
          6.85 MB
          Chris Lohfink
        2. bench2.png
          239 kB
          Michael Shuler
        3. benchmark.html
          366 kB
          Chris Lohfink
        4. benchmark2.png
          283 kB
          Chris Lohfink

        Issue Links

          Activity

            People

              burmanm Michael Burman
              burmanm Michael Burman
              Michael Burman
              Chris Lohfink
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 10m
                  10m