[HBASE-26155] JVM crash when scan - ASF JIRA

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.0-alpha-1
Fix Version/s: 3.0.0-alpha-1, 2.5.0, 2.4.6, 2.3.7
Component/s: Scanners
Labels:
None

Description

There are scanner close caused regionserver JVM coredump problems on our production clusters.

Stack: [0x00007fca4b0cc000,0x00007fca4b1cd000],  sp=0x00007fca4b1cb0d8,  free space=1020k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x7fd314]
J 2810  sun.misc.Unsafe.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)V (0 bytes) @ 0x00007fdae55a9e61 [0x00007fdae55a9d80+0xe1]
j  org.apache.hadoop.hbase.util.UnsafeAccess.unsafeCopy(Ljava/lang/Object;JLjava/lang/Object;JJ)V+36
j  org.apache.hadoop.hbase.util.UnsafeAccess.copy(Ljava/nio/ByteBuffer;I[BII)V+69
j  org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray([BLjava/nio/ByteBuffer;III)V+39
j  org.apache.hadoop.hbase.CellUtil.copyQualifierTo(Lorg/apache/hadoop/hbase/Cell;[BI)I+31
j  org.apache.hadoop.hbase.KeyValueUtil.appendKeyTo(Lorg/apache/hadoop/hbase/Cell;[BI)I+43
J 14724 C2 org.apache.hadoop.hbase.regionserver.StoreScanner.shipped()V (51 bytes) @ 0x00007fdae6a298d0 [0x00007fdae6a29780+0x150]
J 21387 C2 org.apache.hadoop.hbase.regionserver.RSRpcServices$RegionScannerShippedCallBack.run()V (53 bytes) @ 0x00007fdae622bab8 [0x00007fdae622acc0+0xdf8]
J 26353 C2 org.apache.hadoop.hbase.ipc.ServerCall.setResponse(Lorg/apache/hbase/thirdparty/com/google/protobuf/Message;Lorg/apache/hadoop/hbase/CellScanner;Ljava/lang/Throwable;Ljava/lang/String;)V (384 bytes) @ 0x00007fdae7f139d8 [0x00007fdae7f12980+0x1058]
J 26226 C2 org.apache.hadoop.hbase.ipc.CallRunner.run()V (1554 bytes) @ 0x00007fdae959f68c [0x00007fdae959e400+0x128c]
J 19598% C2 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(Ljava/util/concurrent/BlockingQueue;Ljava/util/concurrent/atomic/AtomicInteger;)V (338 bytes) @ 0x00007fdae81c54d4 [0x00007fdae81c53e0+0xf4]

There are also scan rpc errors when coredump happens at the handler,

I found some clue in the logs, that some blocks may be replaced when its nextBlockOnDiskSize less than the newly one in the method

public static boolean shouldReplaceExistingCacheBlock(BlockCache blockCache,
    BlockCacheKey cacheKey, Cacheable newBlock) {
  if (cacheKey.toString().indexOf(".") != -1) { // reference file
    LOG.warn("replace existing cached block, cache key is : " + cacheKey);
    return true;
  }
  Cacheable existingBlock = blockCache.getBlock(cacheKey, false, false, false);
  if (existingBlock == null) {
    return true;
  }
  try {
    int comparison = BlockCacheUtil.validateBlockAddition(existingBlock, newBlock, cacheKey);
    if (comparison < 0) {
      LOG.warn("Cached block contents differ by nextBlockOnDiskSize, the new block has "
          + "nextBlockOnDiskSize set. Caching new block.");
      return true;
......

And the block will be replaced if it is not in the RAMCache but in the BucketCache.

When using

private void putIntoBackingMap(BlockCacheKey key, BucketEntry bucketEntry) {
  BucketEntry previousEntry = backingMap.put(key, bucketEntry);
  if (previousEntry != null && previousEntry != bucketEntry) {
    ReentrantReadWriteLock lock = offsetLock.getLock(previousEntry.offset());
    lock.writeLock().lock();
    try {
      blockEvicted(key, previousEntry, false);
    } finally {
      lock.writeLock().unlock();
    }
  }
}

to replace the old block, to avoid previous bucket entry mem leak, the previous bucket entry will be force released regardless of RPC references to it.

void blockEvicted(BlockCacheKey cacheKey, BucketEntry bucketEntry, boolean decrementBlockNumber) {
  bucketAllocator.freeBlock(bucketEntry.offset());
  realCacheSize.add(-1 * bucketEntry.getLength());
  blocksByHFile.remove(cacheKey);
  if (decrementBlockNumber) {
    this.blockNumber.decrement();
  }
}

I used the check of RPC reference before replace bucket entry, and it works, no coredumps until now.

That is:

public void cacheBlockWithWait(BlockCacheKey cacheKey, Cacheable cachedItem, boolean inMemory,
    boolean wait) {
  if (cacheEnabled) {
    if (backingMap.containsKey(cacheKey) || ramCache.containsKey(cacheKey)) {
      if (BlockCacheUtil.shouldReplaceExistingCacheBlock(this, cacheKey, cachedItem)) {
        BucketEntry bucketEntry = backingMap.get(cacheKey);
        if (bucketEntry != null && bucketEntry.isRpcRef()) {
          // avoid replace when there are RPC refs for the bucket entry in bucket cache
          return;
        }
        cacheBlockWithWaitInternal(cacheKey, cachedItem, inMemory, wait);
      }
    } else {
      cacheBlockWithWaitInternal(cacheKey, cachedItem, inMemory, wait);
    }
  }
}

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

scan-error.png
05/Aug/21 03:23
958 kB
Xiaolin Ha

Issue Links

links to

GitHub Pull Request #3553

JVM crash when scan

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates