Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
3.0.0-alpha-1, 1.3.3, 2.2.0
-
None
-
Reviewed
Description
Problem: Operations which use AsyncRequestFutureImpl.receiveMultiAction(MultiAction, ServerName, MultiResponse, int) API with tablename set to null reset the meta cache of the corresponding server after each call. One such operation is put operation of HTableMultiplexer (Might not be the only one). This may impact the performance of the system severely as all new ops directed to that server will have to go to zk first to get the meta table address and then get the location of the table region as it will become empty after every htablemultiplexer put.
From the logs below, one can see after every other put the cached region locations are cleared. As a side effect of this, before every put the server needs to contact zk and get meta table location and read meta to get region locations of the table.
2018-09-13 22:21:15,467 TRACE [htable-pool11-t1] client.MetaCache(283): Removed all cached region locations that map to root1-thinkpad-t440p,35811,1536857446588 2018-09-13 22:21:15,467 DEBUG [HTableFlushWorker-5] client.HTableMultiplexer$FlushWorker(632): Processed 1 put requests for root1-ThinkPad-T440p:35811 and 0 failed, latency for this send: 5 2018-09-13 22:21:15,515 TRACE [RpcServer.reader=1,bindAddress=root1-ThinkPad-T440p,port=35811] ipc.RpcServer$Connection(1954): RequestHeader call_id: 218 method_name: "Get" request_param: true priority: 0 timeout: 60000 totalRequestSize: 137 bytes 2018-09-13 22:21:15,515 TRACE [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] ipc.CallRunner(105): callId: 218 service: ClientService methodName: Get size: 137 connection: 127.0.0.1:42338 executing as root1 2018-09-13 22:21:15,515 TRACE [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] ipc.RpcServer(2356): callId: 218 service: ClientService methodName: Get size: 137 connection: 127.0.0.1:42338 param: region= testHTableMultiplexer_1,,1536857451720.304d914b641a738624937c7f9b4d684f., row=\x00\x00\x00\xC4 connection: 127.0.0.1:42338, response result { associated_cell_count: 1 stale: false } queueTime: 0 processingTime: 0 totalTime: 0 2018-09-13 22:21:15,516 TRACE [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] io.BoundedByteBufferPool(106): runningAverage=16384, totalCapacity=0, count=0, allocations=1 2018-09-13 22:21:15,516 TRACE [main] ipc.AbstractRpcClient(236): Call: Get, callTime: 2ms 2018-09-13 22:21:15,516 TRACE [main] client.ClientScanner(122): Scan table=hbase:meta, startRow=testHTableMultiplexer_1,\x00\x00\x00\xC5,99999999999999 2018-09-13 22:21:15,516 TRACE [main] client.ClientSmallReversedScanner(179): Advancing internal small scanner to startKey at 'testHTableMultiplexer_1,\x00\x00\x00\xC5,99999999999999' 2018-09-13 22:21:15,517 TRACE [main] client.ZooKeeperRegistry(59): Looking up meta region location in ZK, connection=org.apache.hadoop.hbase.client.ZooKeeperRegistry@599f571f
From the minicluster logs HTableMultiplexer1000Puts.UT.txt one can see that the string "Removed all cached region locations that map" and "Looking up meta region location in ZK" are present for every put.
Analysis:
The problem occurs as we call the cleanServerCache method always clears the server cache in case tablename is null and exception is null. See AsyncRequestFutureImpl.java#L918
private void cleanServerCache(ServerName server, Throwable regionException) { if (tableName == null && ClientExceptionsUtil.isMetaClearingException(regionException)) { // For multi-actions, we don't have a table name, but we want to make sure to clear the // cache in case there were location-related exceptions. We don't to clear the cache // for every possible exception that comes through, however. asyncProcess.connection.clearCaches(server); } }
The problem is ClientExceptionsUtil.isMetaClearingException(regionException)) assumes that the caller should take care of null exception check before calling the method i.e. it will return true if the passed exception is null, which may not be a correct assumption.
Attachments
Attachments
Issue Links
- is related to
-
HBASE-21428 Performance issue due to userRegionLock in the ConnectionManager.
- Resolved
- relates to
-
HBASE-22434 Improve clear meta cache
- Patch Available