Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.0.0-alpha-1
-
None
Description
There are scanner close caused regionserver JVM coredump problems on our production clusters.
Stack: [0x00007fca4b0cc000,0x00007fca4b1cd000], sp=0x00007fca4b1cb0d8, free space=1020k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x7fd314] J 2810 sun.misc.Unsafe.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)V (0 bytes) @ 0x00007fdae55a9e61 [0x00007fdae55a9d80+0xe1] j org.apache.hadoop.hbase.util.UnsafeAccess.unsafeCopy(Ljava/lang/Object;JLjava/lang/Object;JJ)V+36 j org.apache.hadoop.hbase.util.UnsafeAccess.copy(Ljava/nio/ByteBuffer;I[BII)V+69 j org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray([BLjava/nio/ByteBuffer;III)V+39 j org.apache.hadoop.hbase.CellUtil.copyQualifierTo(Lorg/apache/hadoop/hbase/Cell;[BI)I+31 j org.apache.hadoop.hbase.KeyValueUtil.appendKeyTo(Lorg/apache/hadoop/hbase/Cell;[BI)I+43 J 14724 C2 org.apache.hadoop.hbase.regionserver.StoreScanner.shipped()V (51 bytes) @ 0x00007fdae6a298d0 [0x00007fdae6a29780+0x150] J 21387 C2 org.apache.hadoop.hbase.regionserver.RSRpcServices$RegionScannerShippedCallBack.run()V (53 bytes) @ 0x00007fdae622bab8 [0x00007fdae622acc0+0xdf8] J 26353 C2 org.apache.hadoop.hbase.ipc.ServerCall.setResponse(Lorg/apache/hbase/thirdparty/com/google/protobuf/Message;Lorg/apache/hadoop/hbase/CellScanner;Ljava/lang/Throwable;Ljava/lang/String;)V (384 bytes) @ 0x00007fdae7f139d8 [0x00007fdae7f12980+0x1058] J 26226 C2 org.apache.hadoop.hbase.ipc.CallRunner.run()V (1554 bytes) @ 0x00007fdae959f68c [0x00007fdae959e400+0x128c] J 19598% C2 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(Ljava/util/concurrent/BlockingQueue;Ljava/util/concurrent/atomic/AtomicInteger;)V (338 bytes) @ 0x00007fdae81c54d4 [0x00007fdae81c53e0+0xf4]
There are also scan rpc errors when coredump happens at the handler,
I found some clue in the logs, that some blocks may be replaced when its nextBlockOnDiskSize less than the newly one in the method
public static boolean shouldReplaceExistingCacheBlock(BlockCache blockCache, BlockCacheKey cacheKey, Cacheable newBlock) { if (cacheKey.toString().indexOf(".") != -1) { // reference file LOG.warn("replace existing cached block, cache key is : " + cacheKey); return true; } Cacheable existingBlock = blockCache.getBlock(cacheKey, false, false, false); if (existingBlock == null) { return true; } try { int comparison = BlockCacheUtil.validateBlockAddition(existingBlock, newBlock, cacheKey); if (comparison < 0) { LOG.warn("Cached block contents differ by nextBlockOnDiskSize, the new block has " + "nextBlockOnDiskSize set. Caching new block."); return true; ......
And the block will be replaced if it is not in the RAMCache but in the BucketCache.
When using
private void putIntoBackingMap(BlockCacheKey key, BucketEntry bucketEntry) { BucketEntry previousEntry = backingMap.put(key, bucketEntry); if (previousEntry != null && previousEntry != bucketEntry) { ReentrantReadWriteLock lock = offsetLock.getLock(previousEntry.offset()); lock.writeLock().lock(); try { blockEvicted(key, previousEntry, false); } finally { lock.writeLock().unlock(); } } }
to replace the old block, to avoid previous bucket entry mem leak, the previous bucket entry will be force released regardless of RPC references to it.
void blockEvicted(BlockCacheKey cacheKey, BucketEntry bucketEntry, boolean decrementBlockNumber) { bucketAllocator.freeBlock(bucketEntry.offset()); realCacheSize.add(-1 * bucketEntry.getLength()); blocksByHFile.remove(cacheKey); if (decrementBlockNumber) { this.blockNumber.decrement(); } }
I used the check of RPC reference before replace bucket entry, and it works, no coredumps until now.
That is:
public void cacheBlockWithWait(BlockCacheKey cacheKey, Cacheable cachedItem, boolean inMemory, boolean wait) { if (cacheEnabled) { if (backingMap.containsKey(cacheKey) || ramCache.containsKey(cacheKey)) { if (BlockCacheUtil.shouldReplaceExistingCacheBlock(this, cacheKey, cachedItem)) { BucketEntry bucketEntry = backingMap.get(cacheKey); if (bucketEntry != null && bucketEntry.isRpcRef()) { // avoid replace when there are RPC refs for the bucket entry in bucket cache return; } cacheBlockWithWaitInternal(cacheKey, cachedItem, inMemory, wait); } } else { cacheBlockWithWaitInternal(cacheKey, cachedItem, inMemory, wait); } } }
Attachments
Attachments
Issue Links
- links to