Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
None
-
None
-
None
-
We store many phoenix tables in hbase, only one table trigger this issues.
When two jacent keys in wrong order, the observation as follows.
1. the prevKey is written into hbase via Put rpc
2. the current key via phoenix upsert statment( Multi rpc in underlying hbase)
3. the current Cell extends 1MB.
4. the error is easy to re-ocurr.
We store many phoenix tables in hbase, only one table trigger this issues. When two jacent keys in wrong order, the observation as follows. 1. the prevKey is written into hbase via Put rpc 2. the current key via phoenix upsert statment( Multi rpc in underlying hbase) 3. the current Cell extends 1MB. 4. the error is easy to re-ocurr.
Description
when auto flush and manual flush is invoked, cell in immutable memstore segment is scanned one by one to build data block and bloom filter block, during this time, two adjacent keys are compared to assert that they are in lexicographical order, but regretfully, the assertion fails in some scenarios, when RS encounter these assertion failure, always call abort function to terminate abnormally.
the version is 0ba7cc01a13dbb7fec8e6c5d599fc8b4936bae61 tag: rel/2.1.5
At least three cases are found.
error case#1: getQualifierArray throws java.lang.ArrayIndexOutOfBoundsException
17:57:50.328 [MemStoreFlusher.0] ERROR org.apache.hadoop.hbase.regionserver.HRegionServer - ***** ABORTING region server ${regionserver}: Replay of WAL required. Forcing server shutdown ***** org.apache.hadoop.hbase.DroppedSnapshotException: region: ${region}. at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2788) [hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2467) [hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2439) [hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2329) [hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:612) [hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:581) [hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:68) [hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:361) [hbase-server-2.1.5.jar:2.1.5] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144] Caused by: java.lang.ArrayIndexOutOfBoundsException at org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray(ByteBufferUtils.java:1155) ~[hbase-common-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.CellUtil.copyQualifierTo(CellUtil.java:309) ~[hbase-common-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.CellUtil.cloneQualifier(CellUtil.java:121) ~[hbase-common-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.ByteBufferKeyValue.getQualifierArray(ByteBufferKeyValue.java:112) ~[hbase-common-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.CellUtil.getCellKeyAsString(CellUtil.java:1306) ~[hbase-common-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.CellUtil.toString(CellUtil.java:1483) ~[hbase-common-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.ByteBufferKeyValue.toString(ByteBufferKeyValue.java:296) ~[hbase-common-2.1.5.jar:2.1.5] at java.lang.String.valueOf(String.java:2994) ~[?:1.8.0_144] at java.lang.StringBuilder.append(StringBuilder.java:131) ~[?:1.8.0_144] at org.apache.hadoop.hbase.regionserver.StoreScanner.checkScanOrder(StoreScanner.java:1007) ~[hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:582) ~[hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:127) ~[hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:69) ~[hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1021) ~[hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2259) ~[hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2733) ~[hbase-server-2.1.5.jar:2.1.5] ... 8 more 17:57:50.332 [MemStoreFlusher.0] ERROR org.apache.hadoop.hbase.regionserver.HRegionServer - RegionServer abort: loaded coprocessors are: [org.apache.phoenix.coprocessor.ScanRegionObserver, org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver, org.apache.phoenix.hbase.index.Indexer, org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver, org.apache.phoenix.coprocessor.ServerCachingEndpointImpl]
error case#2: StoreScanner.checkScanOrder triggers "prevKey followed by a smaller key"
Exception in thread "RS_IN_MEMORY_COMPACTION-regionserver/${regionserver}" java.lang.AssertionError: Key ${prevKey} followed by a smaller key ${key} at org.apache.hadoop.hbase.regionserver.StoreScanner.checkScanOrder(StoreScanner.java:1007) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:582) at org.apache.hadoop.hbase.regionserver.MemStoreCompactorSegmentsIterator.refillKVS(MemStoreCompactorSegmentsIterator.java:150) at org.apache.hadoop.hbase.regionserver.MemStoreCompactorSegmentsIterator.hasNext(MemStoreCompactorSegmentsIterator.java:72) at org.apache.hadoop.hbase.regionserver.CellChunkImmutableSegment.initializeCellSet(CellChunkImmutableSegment.java:146) at org.apache.hadoop.hbase.regionserver.CellChunkImmutableSegment.<init>(CellChunkImmutableSegment.java:66) at org.apache.hadoop.hbase.regionserver.SegmentFactory.createImmutableSegment(SegmentFactory.java:133) at org.apache.hadoop.hbase.regionserver.SegmentFactory.createImmutableSegmentByCompaction(SegmentFactory.java:59) at org.apache.hadoop.hbase.regionserver.MemStoreCompactor.createSubstitution(MemStoreCompactor.java:217) at org.apache.hadoop.hbase.regionserver.MemStoreCompactor.doCompaction(MemStoreCompactor.java:171) at org.apache.hadoop.hbase.regionserver.MemStoreCompactor.start(MemStoreCompactor.java:109) at org.apache.hadoop.hbase.regionserver.CompactingMemStore.flushInMemory(CompactingMemStore.java:422) at org.apache.hadoop.hbase.regionserver.CompactingMemStore$InMemoryFlushRunnable.run(CompactingMemStore.java:534) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) at java.lang.Thread.run(Thread.java:748) 12:34:07.695 [MemStoreFlusher.0] INFO org.apache.hadoop.hbase.regionserver.HRegion - Flushing 1/1 column families, dataSize=239.26 MB heapSize=256.00 MB 12:34:08.476 [MemStoreFlusher.0] ERROR org.apache.hadoop.hbase.regionserver.HRegionServer - ***** ABORTING region server ${regionserver}: Replay of WAL required. Forcing server shutdown ***** org.apache.hadoop.hbase.DroppedSnapshotException: region: ${region}. at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2788) [hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2467) [hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2439) [hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2329) [hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:612) [hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:581) [hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:68) [hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:361) [hbase-server-2.1.5.jar:2.1.5] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144] Caused by: java.lang.AssertionError: Key ${prevKey} followed by a smaller key ${key} at org.apache.hadoop.hbase.regionserver.StoreScanner.checkScanOrder(StoreScanner.java:1007) ~[hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:582) ~[hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:127) ~[hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:69) ~[hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1021) ~[hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2259) ~[hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2733) ~[hbase-server-2.1.5.jar:2.1.5] ... 8 more
error case#3: BloomContext.sanityCheck triggers "Added a key not lexically larger than previous."
so, I alter the assertion code into if stmt
--- a/hbase-common/src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java +++ b/hbase-common/src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java @@ -33,6 +33,8 @@ import org.apache.hadoop.hbase.io.util.StreamUtils; import org.apache.hadoop.io.IOUtils; import org.apache.hadoop.io.WritableUtils; import org.apache.yetus.audience.InterfaceAudience; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; import sun.nio.ch.DirectBuffer; import org.apache.hbase.thirdparty.com.google.common.annotations.VisibleForTesting; @@ -51,6 +53,7 @@ public final class ByteBufferUtils { @VisibleForTesting final static boolean UNSAFE_AVAIL = UnsafeAvailChecker.isAvailable(); public final static boolean UNSAFE_UNALIGNED = UnsafeAvailChecker.unaligned(); + private static Logger LOG = LoggerFactory.getLogger(ByteBufferUtils.class); private ByteBufferUtils() { } @@ -1152,7 +1155,18 @@ public final class ByteBufferUtils { public static void copyFromBufferToArray(byte[] out, ByteBuffer in, int sourceOffset, int destinationOffset, int length) { if (in.hasArray()) { - System.arraycopy(in.array(), sourceOffset + in.arrayOffset(), out, destinationOffset, length); + try { + System.arraycopy(in.array(), sourceOffset + in.arrayOffset(), out, destinationOffset, length); + } catch (Throwable e){ + if (out == null) { + LOG.warn("[QINGGUI] out=null"); + } else if (in == null || in.array() == null) { + LOG.warn("[QINGGUI] in=null"); + } else { + LOG.warn("[QINGGUI] out.length={}, in.length={}, sourceOffset={} destinationOffset={} length={}", + out.length, in.array().length, sourceOffset, destinationOffset, length); + } + } } else if (UNSAFE_AVAIL) { UnsafeAccess.copy(in, sourceOffset, out, destinationOffset, length); } else {
re-deploy my binary, then the failure drifts to bloom filter building code.
12:54:39.966 [rs(${regionserver})-flush-proc-pool3-thread-2] WARN org.apache.hadoop.hbase.regionserver.HStore - Failed flushing store file, retrying num=9 java.io.IOException: Added a key not lexically larger than previous. Current cell ... at org.apache.hadoop.hbase.util.BloomContext.sanityCheck(BloomContext.java:63) ~[hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.util.BloomContext.writeBloom(BloomContext.java:54) ~[hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.StoreFileWriter.appendGeneralBloomfilter(StoreFileWriter.java:213) ~[hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.StoreFileWriter.append(StoreFileWriter.java:232) ~[hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:133) ~[hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:69) ~[hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1021) [hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2259) [hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2733) [hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2467) [hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2439) [hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2329) [hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2232) [hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure$RegionFlushTask.call(FlushTableSubprocedure.java:68) [hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure$RegionFlushTask.call(FlushTableSubprocedure.java:56) [hbase-server-2.1.5.jar:2.1.5] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_144] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_144] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_144] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_144] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_144] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144] 12:54:39.979 [rs(${regionserver})-flush-proc-pool3-thread-2] ERROR org.apache.hadoop.hbase.regionserver.HRegionServer - ***** ABORTING region server ${regionserver}: Replay of WAL required. Forcing server shutdown ***** org.apache.hadoop.hbase.DroppedSnapshotException: region: ${region}. at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2788) [hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2467) [hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2439) [hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2329) [hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2232) [hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure$RegionFlushTask.call(FlushTableSubprocedure.java:68) [hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure$RegionFlushTask.call(FlushTableSubprocedure.java:56) [hbase-server-2.1.5.jar:2.1.5] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_144] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_144] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_144] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_144] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_144] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144] Caused by: java.io.IOException: Added a key not lexically larger than previous. Current cell = ... at org.apache.hadoop.hbase.util.BloomContext.sanityCheck(BloomContext.java:63) ~[hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.util.BloomContext.writeBloom(BloomContext.java:54) ~[hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.StoreFileWriter.appendGeneralBloomfilter(StoreFileWriter.java:213) ~[hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.StoreFileWriter.append(StoreFileWriter.java:232) ~[hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:133) ~[hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:69) ~[hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1021) ~[hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2259) ~[hbase-server-2.1.5.jar:2.1.5] at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2733) ~[hbase-server-2.1.5.jar:2.1.5] ... 12 more
Attachments
Issue Links
- relates to
-
HBASE-22608 MVCC's writeEntry didn't complete and make MVCC stuck
- Open
-
HBASE-22862 Region Server crash with: Added a key not lexically larger than previous
- Open