Description
Gets hung up in testSSHCleanupDaugtherRegionsOfAbortedSplit waiting on deleteTable
"Thread-334" prio=10 tid=0x00007f15382da800 nid=0x40ae in Object.wait() [0x00007 f1315f5d000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000007e1b525b8> (a org.apache.hadoop.hbase.ipc.RpcClie nt$Call) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1452) - locked <0x00000007e1b525b8> (a org.apache.hadoop.hbase.ipc.RpcClient$C all) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.ja va:1661) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementatio n.callBlockingMethod(RpcClient.java:1719) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService $BlockingStub.disableTable(MasterProtos.java:43749) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplemen tation$5.disableTable(HConnectionManager.java:1995) at org.apache.hadoop.hbase.client.HBaseAdmin$5.call(HBaseAdmin.java:947) at org.apache.hadoop.hbase.client.HBaseAdmin$5.call(HBaseAdmin.java:942) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcR etryingCaller.java:117) - locked <0x00000007dfffe938> (a org.apache.hadoop.hbase.client.RpcRetry ingCaller) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcR etryingCaller.java:93) - locked <0x00000007dfffe938> (a org.apache.hadoop.hbase.client.RpcRetry ingCaller) at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin. java:3398) at org.apache.hadoop.hbase.client.HBaseAdmin.disableTableAsync(HBaseAdmi n.java:942) at org.apache.hadoop.hbase.client.HBaseAdmin.disableTable(HBaseAdmin.jav a:974) at org.apache.hadoop.hbase.HBaseTestingUtility.deleteTable(HBaseTestingUtility.java:1532) at org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.testSSHCleanupDaugtherRegionsOfAbortedSplit(TestSplitTransactionOnCluster.java:1172)
See attached test.log.good.gz and test.log.bad.gz.
In test.log.bad at 2015-01-13 08:02:45,947 we acquire a lock on testSSHCleanupDaugtherRegionsOfAbortedSplit to start a disable but do nothing afterward. Nothing happens for one minute. Then looks like the client makes another request but it can't get the table lock. There's no progress until timeout.
2015-01-13 08:02:45,947 INFO [Thread-334] client.HBaseAdmin$5(945): Started disable of testSSHCleanupDaugtherRegionsOfAbortedSplit 2015-01-13 08:02:45,948 INFO [FifoRpcScheduler.handler1-thread-4] master.HMaster(2213): Client=apurtell//10.40.8.95 disable testSSHCleanupDaugtherRegionsOfAbortedSplit 2015-01-13 08:02:45,950 DEBUG [FifoRpcScheduler.handler1-thread-4] lock.ZKInterProcessLockBase(226): Acquired a lock for /hbase/table-lock/testSSHCleanupDaugtherRegionsOfAbortedSplit/write-master:367740000000001 2015-01-13 08:03:44,585 DEBUG [ip-10-40-8-95.us-west-2.compute.internal,36774,1421136162827-BalancerChore] master.HMaster(1553): Not running balancer because 3 region(s) in transition: {e4f79ac2b4711f7d906a291d94302d6e={e4f79ac2b4711f7d906a291d94302d6e state=SPLITTING, ts=1421136165837, server=ip-10-40-8-95.us-west-2.compute.internal,47769,1421136163047}, ebbcf66ec09960d42a2a49252599bb6d={ebbcf66ec09960d42a2a49252599bb6d state=SPLITTI... 2015-01-13 08:03:46,210 INFO [Thread-334] client.HBaseAdmin$5(945): Started disable of testSSHCleanupDaugtherRegionsOfAbortedSplit 2015-01-13 08:03:46,211 INFO [FifoRpcScheduler.handler1-thread-3] master.HMaster(2213): Client=apurtell//10.40.8.95 disable testSSHCleanupDaugtherRegionsOfAbortedSplit 2015-01-13 08:03:46,214 DEBUG [FifoRpcScheduler.handler1-thread-3] master.TableLockManager$ZKTableLockManager$1(242): Table is locked by [tableName= default+testSSHCleanupDaugtherRegionsOfAbortedSplit, lockOwner=ip-10-40-8-95.us-west-2.compute.internal,36774,1421136162827, threadId=486, purpose=C_M_DISABLE_TABLE, isShared=false, createTime=1421136165948]
If we look at test.log.good where the same sequence of events begin, the table is quickly disabled and deleted:
2015-01-13 08:01:47,923 INFO [Thread-333] client.HBaseAdmin$5(945): Started disable of testSSHCleanupDaugtherRegionsOfAbortedSplit 2015-01-13 08:01:47,924 INFO [FifoRpcScheduler.handler1-thread-5] master.HMaster(2213): Client=apurtell//10.40.8.95 disable testSSHCleanupDaugtherRegionsOfAbortedSplit 2015-01-13 08:01:47,926 DEBUG [FifoRpcScheduler.handler1-thread-5] lock.ZKInterProcessLockBase(226): Acquired a lock for /hbase/table-lock/testSSHCleanupDaugtherRegionsOfAbortedSplit/write-master:383350000000001 2015-01-13 08:01:47,932 INFO [MASTER_TABLE_OPERATIONS-ip-10-40-8-95:38335-0] handler.DisableTableHandler(130): Attempting to disable table testSSHCleanupDaugtherRegionsOfAbortedSplit 2015-01-13 08:01:47,933 INFO [Thread-333] zookeeper.RecoverableZooKeeper(121): Process identifier=catalogtracker-on-hconnection-0x60cf9dfa connecting to ZooKeeper ensemble=localhost:59024 2015-01-13 08:01:47,933 DEBUG [Thread-333] catalog.CatalogTracker(198): Starting catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@610b218c 2015-01-13 08:01:47,935 DEBUG [Thread-333-EventThread] zookeeper.ZooKeeperWatcher(313): catalogtracker-on-hconnection-0x60cf9dfa0x0, quorum=localhost:59024, baseZNode=/hbase Received ZooKeeper Event, type=None, state=SyncConnected, path=null 2015-01-13 08:01:47,935 DEBUG [Thread-333] zookeeper.ZKUtil(430): catalogtracker-on-hconnection-0x60cf9dfa0x0, quorum=localhost:59024, baseZNode=/hbase Set watcher on existing znode=/hbase/meta-region-server 2015-01-13 08:01:47,936 INFO [MASTER_TABLE_OPERATIONS-ip-10-40-8-95:38335-0] handler.DisableTableHandler(190): Disabled table, testSSHCleanupDaugtherRegionsOfAbortedSplit, is done=true 2015-01-13 08:01:47,936 DEBUG [Thread-333-EventThread] zookeeper.ZooKeeperWatcher(390): catalogtracker-on-hconnection-0x60cf9dfa-0x14ae2500c300017 connected 2015-01-13 08:01:47,938 DEBUG [MASTER_TABLE_OPERATIONS-ip-10-40-8-95:38335-0] lock.ZKInterProcessLockBase(328): Released /hbase/table-lock/testSSHCleanupDaugtherRegionsOfAbortedSplit/write-master:383350000000001 2015-01-13 08:01:47,941 DEBUG [Thread-333] catalog.CatalogTracker(222): Stopping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@610b218c 2015-01-13 08:01:47,945 INFO [Thread-333] client.HBaseAdmin(999): Disabled testSSHCleanupDaugtherRegionsOfAbortedSplit 2015-01-13 08:01:47,946 INFO [FifoRpcScheduler.handler1-thread-2] master.HMaster(2047): Client=apurtell//10.40.8.95 delete testSSHCleanupDaugtherRegionsOfAbortedSplit 2015-01-13 08:01:47,948 DEBUG [FifoRpcScheduler.handler1-thread-2] lock.ZKInterProcessLockBase(226): Acquired a lock for /hbase/table-lock/testSSHCleanupDaugtherRegionsOfAbortedSplit/write-master:383350000000002 2015-01-13 08:01:47,953 INFO [MASTER_TABLE_OPERATIONS-ip-10-40-8-95:38335-0] handler.TableEventHandler(123): Handling table operation C_M_DELETE_TABLE on table testSSHCleanupDaugtherRegionsOfAbortedSplit 2015-01-13 08:01:47,958 DEBUG [MASTER_TABLE_OPERATIONS-ip-10-40-8-95:38335-0] handler.DeleteTableHandler(93): Deleting regions from META 2015-01-13 08:01:47,962 INFO [MASTER_TABLE_OPERATIONS-ip-10-40-8-95:38335-0] catalog.MetaEditor(496): Deleted [{ENCODED => bc81e04ede70636e30fd1224df8bc70b, NAME => 'testSSHCleanupDaugtherRegionsOfAbortedSplit,,1421136107549.bc81e04ede70636e30fd1224df8bc70b.', STARTKEY => '', ENDKEY => ''}] 2015-01-13 08:01:47,964 DEBUG [MASTER_TABLE_OPERATIONS-ip-10-40-8-95:38335-0] handler.DeleteTableHandler(104): Archiving region testSSHCleanupDaugtherRegionsOfAbortedSplit,,1421136107549.bc81e04ede70636e30fd1224df8bc70b. from FS 2015-01-13 08:01:47,965 DEBUG [MASTER_TABLE_OPERATIONS-ip-10-40-8-95:38335-0] backup.HFileArchiver(93): ARCHIVING hdfs://localhost:43493/user/apurtell/hbase/.tmp/data/default/testSSHCleanupDaugtherRegionsOfAbortedSplit/bc81e04ede70636e30fd1224df8bc70b 2015-01-13 08:01:47,969 DEBUG [MASTER_TABLE_OPERATIONS-ip-10-40-8-95:38335-0] backup.HFileArchiver(134): Archiving [class org.apache.hadoop.hbase.backup.HFileArchiver$FileablePath, file:hdfs://localhost:43493/user/apurtell/hbase/.tmp/data/default/testSSHCleanupDaugtherRegionsOfAbortedSplit/bc81e04ede70636e30fd1224df8bc70b/f] 2015-01-13 08:01:47,978 DEBUG [MASTER_TABLE_OPERATIONS-ip-10-40-8-95:38335-0] backup.HFileArchiver(438): Finished archiving from class org.apache.hadoop.hbase.backup.HFileArchiver$FileablePath, file:hdfs://localhost:43493/user/apurtell/hbase/.tmp/data/default/testSSHCleanupDaugtherRegionsOfAbortedSplit/bc81e04ede70636e30fd1224df8bc70b/f/9b6aaa5d9dd44f3ab113a32c029d50ae, to hdfs://localhost:43493/user/apurtell/hbase/archive/data/default/testSSHCleanupDaugtherRegionsOfAbortedSplit/bc81e04ede70636e30fd1224df8bc70b/f/9b6aaa5d9dd44f3ab113a32c029d50ae 2015-01-13 08:01:47,979 INFO [IPC Server handler 6 on 43493] blockmanagement.BlockManager(1009): BLOCK* addToInvalidates: blk_1073741839_1015 127.0.0.1:49933 127.0.0.1:55797 127.0.0.1:57090 2015-01-13 08:01:47,979 DEBUG [MASTER_TABLE_OPERATIONS-ip-10-40-8-95:38335-0] backup.HFileArchiver(453): Deleted all region files in: hdfs://localhost:43493/user/apurtell/hbase/.tmp/data/default/testSSHCleanupDaugtherRegionsOfAbortedSplit/bc81e04ede70636e30fd1224df8bc70b 2015-01-13 08:01:47,980 INFO [IPC Server handler 2 on 43493] blockmanagement.BlockManager(1009): BLOCK* addToInvalidates: blk_1073741838_1014 127.0.0.1:57090 127.0.0.1:49933 127.0.0.1:55797 2015-01-13 08:01:47,980 DEBUG [MASTER_TABLE_OPERATIONS-ip-10-40-8-95:38335-0] handler.DeleteTableHandler(114): Table 'testSSHCleanupDaugtherRegionsOfAbortedSplit' archived! 2015-01-13 08:01:47,980 DEBUG [MASTER_TABLE_OPERATIONS-ip-10-40-8-95:38335-0] handler.DeleteTableHandler(117): Removing 'testSSHCleanupDaugtherRegionsOfAbortedSplit' descriptor. 2015-01-13 08:01:47,981 DEBUG [MASTER_TABLE_OPERATIONS-ip-10-40-8-95:38335-0] handler.DeleteTableHandler(121): Removing 'testSSHCleanupDaugtherRegionsOfAbortedSplit' from region states. 2015-01-13 08:01:47,981 DEBUG [MASTER_TABLE_OPERATIONS-ip-10-40-8-95:38335-0] handler.DeleteTableHandler(125): Marking 'testSSHCleanupDaugtherRegionsOfAbortedSplit' as deleted. 2015-01-13 08:01:47,984 DEBUG [MASTER_TABLE_OPERATIONS-ip-10-40-8-95:38335-0] lock.ZKInterProcessLockBase(328): Released /hbase/table-lock/testSSHCleanupDaugtherRegionsOfAbortedSplit/write-master:383350000000002
The first interesting line missing in test.log.bad is:
2015-01-13 08:01:47,932 INFO [MASTER_TABLE_OPERATIONS-ip-10-40-8-95:38335-0] handler.DisableTableHandler(130): Attempting to disable table testSSHCleanupDaugtherRegionsOfAbortedSplit
Nowhere in test.log.bad does "Attempting to disable table testSSHCleanupDaugtherRegionsOfAbortedSplit" appear.