Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.3.1
-
None
-
None
-
Incompatible change, Reviewed
-
Description
We ran into the following deadlocked server in testing. The priority handlers seem stuck across multiple thread dumps. Seven of the ten total priority threads have this state:
"RpcServer.priority.RWQ.Fifo.read.handler=5,queue=1,port=16020" #82 daemon prio=5 os_prio=0 cpu=0.70ms elapsed=315627.86s allocated=3744B defined_classes=0 tid=0x00007f3da0983040 nid=0x62d9 in Object.wait() [0x00007f3d9bc8c000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3327) at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1491) at org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:3143) at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3478) at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:44858) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:393) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
The anomalous three are as follows:
#1
"RpcServer.priority.RWQ.Fifo.write.handler=0,queue=0,port=16020" #77 daemon prio=5 os_prio=0 cpu=175.98ms elapsed=315627.86s allocated=2153K defined_classes=14 tid=0x00007f3da0ae6ec0 nid=0x62d4 in Object.wait() [0x00007f3d9c190000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.hbase.client.RegionInfo.<clinit>(RegionInfo.java:72) at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3327) at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1491) at org.apache.hadoop.hbase.regionserver.RSRpcServices.mutate(RSRpcServices.java:2912) at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:44856) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:393) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
...which is the creation of the UNDEFINED in RegionInfo here:
@InterfaceAudience.Publicpublic interface RegionInfo extends Comparable<RegionInfo> {
RegionInfo UNDEFINED = RegionInfoBuilder.newBuilder(TableName.valueOf("_UNDEFINED_")).build();
#2
"RpcServer.priority.RWQ.Fifo.read.handler=4,queue=1,port=16020" #81 daemon prio=5 os_prio=0 cpu=53.85ms elapsed=315627.86s allocated=81984B defined_classes=3 tid=0x00007f3da0981590 nid=0x62d8 in Object.wait() [0x00007f3d9bd8c000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.hbase.client.RegionInfoBuilder.<clinit>(RegionInfoBuilder.java:49) at org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.toRegionInfo(ProtobufUtil.java:3231) at org.apache.hadoop.hbase.regionserver.RSRpcServices.executeOpenRegionProcedures(RSRpcServices.java:3755) at org.apache.hadoop.hbase.regionserver.RSRpcServices.lambda$executeProcedures$2(RSRpcServices.java:3827) at org.apache.hadoop.hbase.regionserver.RSRpcServices$$Lambda$173/0x00000017c0e40040.accept(Unknown Source) at java.util.ArrayList.forEach(java.base@11.0.6/ArrayList.java:1540) at java.util.Collections$UnmodifiableCollection.forEach(java.base@11.0.6/Collections.java:1085) at org.apache.hadoop.hbase.regionserver.RSRpcServices.executeProcedures(RSRpcServices.java:3827) at org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:34896) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:393) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
which is here creating meta MetaRegionInfo..
public static final RegionInfo FIRST_META_REGIONINFO =
new MutableRegionInfo(1L, TableName.META_TABLE_NAME, RegionInfo.DEFAULT_REPLICA_ID);
#3
"RpcServer.priority.RWQ.Fifo.read.handler=8,queue=1,port=16020" #85 daemon prio=5 os_prio=0 cpu=0.50ms elapsed=315627.85s allocated=1960B defined_classes=0 tid=0x00007f3da0d851d0 nid=0x62dc in Object.wait() [0x00007f3d9b989000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.toRegionInfo(ProtobufUtil.java:3231) at org.apache.hadoop.hbase.regionserver.RSRpcServices.executeOpenRegionProcedures(RSRpcServices.java:3755) at org.apache.hadoop.hbase.regionserver.RSRpcServices.lambda$executeProcedures$2(RSRpcServices.java:3827) at org.apache.hadoop.hbase.regionserver.RSRpcServices$$Lambda$173/0x00000017c0e40040.accept(Unknown Source) at java.util.ArrayList.forEach(java.base@11.0.6/ArrayList.java:1540) at java.util.Collections$UnmodifiableCollection.forEach(java.base@11.0.6/Collections.java:1085) at org.apache.hadoop.hbase.regionserver.RSRpcServices.executeProcedures(RSRpcServices.java:3827) at org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:34896) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:393) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
... which is here in code
if (tableName.equals(TableName.META_TABLE_NAME) && replicaId == defaultReplicaId) {
return RegionInfoBuilder.FIRST_META_REGIONINFO;
}
The thread dump does not seem to recognize the above as a deadlock.
...at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3327) is doing the below:
return this.onlineRegions.get(encodedRegionName);
... where onlineRegions is concurrent Map of String to HRegion.
Attachments
Attachments
Issue Links
- is broken by
-
HBASE-22723 Have CatalogJanitor report holes and overlaps; i.e. problems it sees when doing its regular scan of hbase:meta
- Resolved
- links to