Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Cannot Reproduce
-
None
-
None
-
None
-
None
Description
In flaky failure on hadoop2 runs of such as:
- TestImportTsv/testBulkOutputWithoutAnExistingTable
- TestImportTsv/testMROnTable
- TestImportExport/testWithFilter
- (and many others)
We have logs with hanging threads and failed file deletes that look like this.
2013-04-24 06:05:01,807 WARN [ContainersLauncher #0] nodemanager.DefaultContainerExecutor(193): Exit code from task is : 137
2013-04-24 06:05:06,520 INFO [pool-1-thread-1] hbase.ResourceChecker(171): after: mapreduce.TestImportExport#testExportScannerBatching Thread=539 (was 534)
Potentially hanging thread: hbase-table-pool-25-thread-1
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196)
java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:424)
...
<threads seemingly related to dfs connection>
2013-04-24 06:03:28,351 WARN [DeletionService #0] nodemanager.DefaultContainerExecutor(276): delete returned false for path: [/var/lib/jenkins/workspace/apache-hbase-trunk-hadoop2/trunk/hbase-server/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-localDir-nm-0_0/usercache/jenkins/appcache/application_1366808588748_0001/container_1366808588748_0001_01_000001] 2013-04-24 06:03:28,353 WARN [DeletionService #1] nodemanager.DefaultContainerExecutor(276): delete returned false for path: [/var/lib/jenkins/workspace/apache-hbase-trunk-hadoop2/trunk/hbase-server/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-localDir-nm-0_1/usercache/jenkins/appcache/application_1366808588748_0001/container_1366808588748_0001_01_000001] 2013-04-24 06:03:28,353 WARN [DeletionService #2] nodemanager.DefaultContainerExecutor(276): delete returned false for path: [/var/lib/jenkins/workspace/apache-hbase-trunk-hadoop2/trunk/hbase-server/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-localDir-nm-0_2/usercache/jenkins/appcache/application_1366808588748_0001/container_1366808588748_0001_01_000001] 2013-04-24 06:03:28,354 WARN [DeletionService #0] nodemanager.DefaultContainerExecutor(276): delete returned false for path: [/var/lib/jenkins/workspace/apache-hbase-trunk-hadoop2/trunk/hbase-server/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-localDir-nm-0_3/usercache/jenkins/appcache/application_1366808588748_0001/container_1366808588748_0001_01_000001]