Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
Reviewed
Description
Fails pretty frequently in hadoopqa builds.
There is a recent hang in org.apache.hadoop.hbase.TestFullLogReconstruction.tearDownAfterClass(TestFullLogReconstruction.java:68)
... see here.
Thread 1250 (RS_CLOSE_META-edd281aedb18:59863-0):
State: TIMED_WAITING
Blocked count: 92
Waited count: 278
Stack:
java.lang.Object.wait(Native Method)
org.apache.hadoop.hbase.regionserver.wal.SyncFuture.get(SyncFuture.java:133)
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.blockOnSync(AbstractFSWAL.java:718)
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:605)
org.apache.hadoop.hbase.regionserver.wal.WALUtil.doFullAppendTransaction(WALUtil.java:154)
org.apache.hadoop.hbase.regionserver.wal.WALUtil.writeFlushMarker(WALUtil.java:81)
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2645)
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2356)
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2328)
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2319)
org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1531)
org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1437)
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:104)
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
We missed a signal? We need to do an interrupt? The log is not all there in hadoopqa builds so hard to see all that is going on. This test is not in the flakey set either....
Attachments
Attachments
Issue Links
- relates to
-
HBASE-19929 Call RS.stop on a session expired RS may hang
- Resolved
IIRC this happened before. I used to introduce a close chore in AsyncFSWAL to solve it and later removed it. Let me revisit the commit history to find out what happened.