Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-18932

Harry-found CorruptSSTableException / RT Closer issue when reading entire partition

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • 5.0-beta1, 5.0
    • Local/SSTable
    • None

    Description

      While testing some new machinery for Harry, I have encountered a new RT closer / SSTable Corruption issue. I have grounds to believe this was introduced during the last year.

      Issue seems to happen because of intricate interleaving of flushes with writes and deletes.

      ERROR [ReadStage-2] 2023-10-16 18:47:06,696 JVMStabilityInspector.java:76 - Exception in thread Thread[ReadStage-2,5,SharedPool]
      org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: RandomAccessReader:BufferManagingRebufferer.Aligned:CompressedChunkReader.Mmap(/Users/ifesdjeen/foss/java/apache-cassandra-4.0/data/data1/harry/table_1-07c35a606c0a11eeae7a4f6ca489eb0c/nc-5-big-Data.db - LZ4Compressor, chunk length 16384, data length 232569)
              at org.apache.cassandra.io.sstable.AbstractSSTableIterator$AbstractReader.hasNext(AbstractSSTableIterator.java:381)
              at org.apache.cassandra.io.sstable.AbstractSSTableIterator.hasNext(AbstractSSTableIterator.java:242)
              at org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:95)
              at org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32)
              at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
              at org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:133)
              at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:376)
              at org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:188)
              at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:157)
              at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
              at org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:534)
              at org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:402)
              at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
              at org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:95)
              at org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32)
              at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
              at org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:133)
              at org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:133)
              at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:151)
              at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:101)
              at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:86)
              at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:343)
              at org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:201)
              at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:186)
              at org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:48)
              at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:346)
              at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:2186)
              at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2581)
              at org.apache.cassandra.concurrent.ExecutionFailure$2.run(ExecutionFailure.java:163)
              at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:143)
              at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
              at java.base/java.lang.Thread.run(Thread.java:829)
              Suppressed: java.lang.IllegalStateException: PROCESSED UnfilteredRowIterator for harry.table_1 (key: ZinzDdUuABgDknItABgDknItABgDknItXEFrgBnOmPmPylWrwXHqjBHgeQrGfnZd1124124583:ZinzDdUuABgDknItABgDknItABgDknItABgDknItABgDknItzHqchghqCXLhVYKM22215251:3.2758E-41 omdt: [deletedAt=564416, localDeletion=1697450085]) has an illegal RT bounds sequence: expected all RTs to be closed, but the last one is open
                      at org.apache.cassandra.db.transform.RTBoundValidator$RowsTransformation.ise(RTBoundValidator.java:117)
                      at org.apache.cassandra.db.transform.RTBoundValidator$RowsTransformation.onPartitionClose(RTBoundValidator.java:112)
                      at org.apache.cassandra.db.transform.BaseRows.runOnClose(BaseRows.java:91)
                      at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:95)
                      at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:341)
                      ... 10 common frames omitted
      Caused by: java.io.IOException: Invalid Columns subset bytes; too many bits set:10
              at org.apache.cassandra.db.Columns$Serializer.deserializeSubset(Columns.java:578)
              at org.apache.cassandra.db.rows.UnfilteredSerializer.deserializeRowBody(UnfilteredSerializer.java:604)
              at org.apache.cassandra.db.UnfilteredDeserializer.readNext(UnfilteredDeserializer.java:143)
              at org.apache.cassandra.io.sstable.format.big.SSTableIterator$ForwardIndexedReader.computeNext(SSTableIterator.java:175)
              at org.apache.cassandra.io.sstable.AbstractSSTableIterator$ForwardReader.hasNextInternal(AbstractSSTableIterator.java:533)
              at org.apache.cassandra.io.sstable.AbstractSSTableIterator$AbstractReader.hasNext(AbstractSSTableIterator.java:368)
              ... 31 common frames omitted 

       

      Unfortunately, harry branch is not ready for release yet. That said, I have a snapshot pinned for quick repro and can share SSTables that will easily repro the issue if there are any takers.

      To reproduce:

      paging 1; # make sure to set page size to 1 (it breaks with other page sizes, too)
      select * from harry.table_1;
      

      Please also make sure to modify snitch to set rack/dc:

      +    public static final String DATA_CENTER_NAME = "datacenter0";
      +    public static final String RACK_NAME = "rack0";
      

      And set directories in cassandra.yaml: 

      +data_file_directories:
      + - /your/path/data/data0
      + - /your/path/data/data1
      + -/your/path/data/data2

      SHA for repro: 865d7c30e4755e74c4e4d26205a7aed4cfb55710

      Attachments

        1. node1_.zip
          1.53 MB
          Alex Petrov
        2. operation.log.zip
          861 kB
          Alex Petrov
        3. screenshot-1.png
          35 kB
          Jacek Lewandowski

        Issue Links

          Activity

            People

              jlewandowski Jacek Lewandowski
              ifesdjeen Alex Petrov
              Alex Petrov, Jacek Lewandowski
              Alex Petrov
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 10m
                  10m