[HBASE-21196] HTableMultiplexer clears the meta cache after every put operation - ASF JIRA

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 3.0.0-alpha-1, 1.3.3, 2.2.0
Fix Version/s: 3.0.0-alpha-1, 2.2.0, 2.1.1, 2.0.3, 1.4.10, 1.3.4, 1.2.11
Component/s: Performance
Labels:
None

Hadoop Flags:

Reviewed

Description

Problem: Operations which use AsyncRequestFutureImpl.receiveMultiAction(MultiAction, ServerName, MultiResponse, int) API with tablename set to null reset the meta cache of the corresponding server after each call. One such operation is put operation of HTableMultiplexer (Might not be the only one). This may impact the performance of the system severely as all new ops directed to that server will have to go to zk first to get the meta table address and then get the location of the table region as it will become empty after every htablemultiplexer put.

From the logs below, one can see after every other put the cached region locations are cleared. As a side effect of this, before every put the server needs to contact zk and get meta table location and read meta to get region locations of the table.

2018-09-13 22:21:15,467 TRACE [htable-pool11-t1] client.MetaCache(283): Removed all cached region locations that map to root1-thinkpad-t440p,35811,1536857446588
2018-09-13 22:21:15,467 DEBUG [HTableFlushWorker-5] client.HTableMultiplexer$FlushWorker(632): Processed 1 put requests for root1-ThinkPad-T440p:35811 and 0 failed, latency for this send: 5
2018-09-13 22:21:15,515 TRACE [RpcServer.reader=1,bindAddress=root1-ThinkPad-T440p,port=35811] ipc.RpcServer$Connection(1954): RequestHeader call_id: 218 method_name: "Get" request_param: true priority: 0 timeout: 60000 totalRequestSize: 137 bytes
2018-09-13 22:21:15,515 TRACE [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] ipc.CallRunner(105): callId: 218 service: ClientService methodName: Get size: 137 connection: 127.0.0.1:42338 executing as root1
2018-09-13 22:21:15,515 TRACE [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] ipc.RpcServer(2356): callId: 218 service: ClientService methodName: Get size: 137 connection: 127.0.0.1:42338 param: region= testHTableMultiplexer_1,,1536857451720.304d914b641a738624937c7f9b4d684f., row=\x00\x00\x00\xC4 connection: 127.0.0.1:42338, response result { associated_cell_count: 1 stale: false } queueTime: 0 processingTime: 0 totalTime: 0
2018-09-13 22:21:15,516 TRACE [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] io.BoundedByteBufferPool(106): runningAverage=16384, totalCapacity=0, count=0, allocations=1
2018-09-13 22:21:15,516 TRACE [main] ipc.AbstractRpcClient(236): Call: Get, callTime: 2ms
2018-09-13 22:21:15,516 TRACE [main] client.ClientScanner(122): Scan table=hbase:meta, startRow=testHTableMultiplexer_1,\x00\x00\x00\xC5,99999999999999
2018-09-13 22:21:15,516 TRACE [main] client.ClientSmallReversedScanner(179): Advancing internal small scanner to startKey at 'testHTableMultiplexer_1,\x00\x00\x00\xC5,99999999999999'
2018-09-13 22:21:15,517 TRACE [main] client.ZooKeeperRegistry(59): Looking up meta region location in ZK, connection=org.apache.hadoop.hbase.client.ZooKeeperRegistry@599f571f

From the minicluster logs HTableMultiplexer1000Puts.UT.txt one can see that the string "Removed all cached region locations that map" and "Looking up meta region location in ZK" are present for every put.

Analysis:
The problem occurs as we call the cleanServerCache method always clears the server cache in case tablename is null and exception is null. See AsyncRequestFutureImpl.java#L918

private void cleanServerCache(ServerName server, Throwable regionException) {
    if (tableName == null && ClientExceptionsUtil.isMetaClearingException(regionException)) {
      // For multi-actions, we don't have a table name, but we want to make sure to clear the
      // cache in case there were location-related exceptions. We don't to clear the cache
      // for every possible exception that comes through, however.
      asyncProcess.connection.clearCaches(server);
    }
  }

The problem is ClientExceptionsUtil.isMetaClearingException(regionException)) assumes that the caller should take care of null exception check before calling the method i.e. it will return true if the passed exception is null, which may not be a correct assumption.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HBASE-21196-branch-1.patch
14/Jan/19 19:25
9 kB
Andrew Kyle Purtell
HBASE-21196.master.002.patch
14/Sep/18 19:02
8 kB
Nihal Jain
HBASE-21196.master.001.patch
14/Sep/18 06:27
8 kB
Nihal Jain
HBASE-21196.master.001.patch
13/Sep/18 20:58
8 kB
Nihal Jain
HTableMultiplexer1000Puts.UT.txt
13/Sep/18 20:45
7.57 MB
Nihal Jain

Issue Links

is related to

HBASE-21428 Performance issue due to userRegionLock in the ConnectionManager.

Resolved

relates to

HBASE-22434 Improve clear meta cache

Patch Available

HTableMultiplexer clears the meta cache after every put operation

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates