Details
-
Bug
-
Status: Resolved
-
Normal
-
Resolution: Fixed
-
None
-
Normal
Description
With the following table, that contains a lot of cells:
CREATE TABLE biggraphite.datapoints_11520p_60s ( metric uuid, time_start_ms bigint, offset smallint, count int, value double, PRIMARY KEY ((metric, time_start_ms), offset) ) WITH CLUSTERING ORDER BY (offset DESC); AND compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 'compaction_window_size': '6', 'compaction_window_unit': 'HOURS', 'max_threshold': '32', 'min_threshold': '6'} Keyspace : biggraphite Read Count: 1822 Read Latency: 1.8870054884742042 ms. Write Count: 2212271647 Write Latency: 0.027705127678653473 ms. Pending Flushes: 0 Table: datapoints_11520p_60s SSTable count: 47 Space used (live): 300417555945 Space used (total): 303147395017 Space used by snapshots (total): 0 Off heap memory used (total): 207453042 SSTable Compression Ratio: 0.4955200053039823 Number of keys (estimate): 16343723 Memtable cell count: 220576 Memtable data size: 17115128 Memtable off heap memory used: 0 Memtable switch count: 2872 Local read count: 0 Local read latency: NaN ms Local write count: 1103167888 Local write latency: 0.025 ms Pending flushes: 0 Percent repaired: 0.0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used: 105118296 Bloom filter off heap memory used: 106547192 Index summary off heap memory used: 27730962 Compression metadata off heap memory used: 73174888 Compacted partition minimum bytes: 61 Compacted partition maximum bytes: 51012 Compacted partition mean bytes: 7899 Average live cells per slice (last five minutes): NaN Maximum live cells per slice (last five minutes): 0 Average tombstones per slice (last five minutes): NaN Maximum tombstones per slice (last five minutes): 0 Dropped Mutations: 0
It looks like a good chunk of the compaction time is lost in StreamingHistogram.update() (which is used to store the estimated tombstone drop times).
This could be caused by a huge number of different deletion times which would makes the bin huge but it this histogram should be capped to 100 keys. It's more likely caused by the huge number of cells.
A simple solutions could be to only take into accounts part of the cells, the fact the this table has a TWCS also gives us an additional hint that sampling deletion times would be fine.
Attachments
Attachments
Issue Links
- is related to
-
CASSANDRA-13281 testall failure in org.apache.cassandra.io.sstable.metadata.MetadataSerializerTest.testSerialization
- Resolved
-
CASSANDRA-13756 StreamingHistogram is not thread safe
- Resolved
- relates to
-
CASSANDRA-13024 Droppable Tombstone Ratio Calculation
- Open
-
CASSANDRA-13752 Corrupted SSTables created in 3.11
- Resolved
-
CASSANDRA-13040 Estimated TS drop-time histogram updated with Cell.NO_DELETION_TIME
- Resolved