Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-24896

'Stuck' in static initialization creating RegionInfo instance

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.1
    • 3.0.0-alpha-1, 2.4.0, 2.3.2
    • None
    • None
    • Incompatible change, Reviewed
    • Hide
      1. Untangle RegionInfo, RegionInfoBuilder, and MutableRegionInfo static
      initializations.
      2. Undo static initializing references from RegionInfo to RegionInfoBuilder.
      3. Mark RegionInfo#UNDEFINED IA.Private and deprecated;
      it is for internal use only and likely to be removed in HBase4. (sub-task HBASE-24918)
      4. Move MutableRegionInfo from inner-class of
      RegionInfoBuilder to be (package private) standalone. (sub-task HBASE-24918)
      Show
      1. Untangle RegionInfo, RegionInfoBuilder, and MutableRegionInfo static initializations. 2. Undo static initializing references from RegionInfo to RegionInfoBuilder. 3. Mark RegionInfo#UNDEFINED IA.Private and deprecated; it is for internal use only and likely to be removed in HBase4. (sub-task HBASE-24918 ) 4. Move MutableRegionInfo from inner-class of RegionInfoBuilder to be (package private) standalone. (sub-task HBASE-24918 )

    Description

      We ran into the following deadlocked server in testing. The priority handlers seem stuck across multiple thread dumps. Seven of the ten total priority threads have this state:

      "RpcServer.priority.RWQ.Fifo.read.handler=5,queue=1,port=16020" #82 daemon prio=5 os_prio=0 cpu=0.70ms elapsed=315627.86s allocated=3744B defined_classes=0 tid=0x00007f3da0983040 nid=0x62d9 in Object.wait()  [0x00007f3d9bc8c000]
         java.lang.Thread.State: RUNNABLE
      	at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3327)
      	at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1491)
      	at org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:3143)
      	at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3478)
      	at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:44858)
      	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:393)
      	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
      	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
      	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318) 

      The anomalous three are as follows:

      #1

      "RpcServer.priority.RWQ.Fifo.write.handler=0,queue=0,port=16020" #77 daemon prio=5 os_prio=0 cpu=175.98ms elapsed=315627.86s allocated=2153K defined_classes=14 tid=0x00007f3da0ae6ec0 nid=0x62d4 in Object.wait()  [0x00007f3d9c190000]
         java.lang.Thread.State: RUNNABLE
      	at org.apache.hadoop.hbase.client.RegionInfo.<clinit>(RegionInfo.java:72)
      	at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3327)
      	at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1491)
      	at org.apache.hadoop.hbase.regionserver.RSRpcServices.mutate(RSRpcServices.java:2912)
      	at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:44856)
      	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:393)
      	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
      	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
      	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)

      ...which is the creation of the UNDEFINED in RegionInfo here:

      @InterfaceAudience.Publicpublic interface RegionInfo extends Comparable<RegionInfo> {
      RegionInfo UNDEFINED = RegionInfoBuilder.newBuilder(TableName.valueOf("_UNDEFINED_")).build();

       

      #2

      "RpcServer.priority.RWQ.Fifo.read.handler=4,queue=1,port=16020" #81 daemon prio=5 os_prio=0 cpu=53.85ms elapsed=315627.86s allocated=81984B defined_classes=3 tid=0x00007f3da0981590 nid=0x62d8 in Object.wait()  [0x00007f3d9bd8c000]
         java.lang.Thread.State: RUNNABLE
      	at org.apache.hadoop.hbase.client.RegionInfoBuilder.<clinit>(RegionInfoBuilder.java:49)
      	at org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.toRegionInfo(ProtobufUtil.java:3231)
      	at org.apache.hadoop.hbase.regionserver.RSRpcServices.executeOpenRegionProcedures(RSRpcServices.java:3755)
      	at org.apache.hadoop.hbase.regionserver.RSRpcServices.lambda$executeProcedures$2(RSRpcServices.java:3827)
      	at org.apache.hadoop.hbase.regionserver.RSRpcServices$$Lambda$173/0x00000017c0e40040.accept(Unknown Source)
      	at java.util.ArrayList.forEach(java.base@11.0.6/ArrayList.java:1540)
      	at java.util.Collections$UnmodifiableCollection.forEach(java.base@11.0.6/Collections.java:1085)
      	at org.apache.hadoop.hbase.regionserver.RSRpcServices.executeProcedures(RSRpcServices.java:3827)
      	at org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:34896)
      	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:393)
      	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
      	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
      	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318) 

      which is here creating meta MetaRegionInfo..

       

      public static final RegionInfo FIRST_META_REGIONINFO =
      new MutableRegionInfo(1L, TableName.META_TABLE_NAME, RegionInfo.DEFAULT_REPLICA_ID);

       

      #3

      "RpcServer.priority.RWQ.Fifo.read.handler=8,queue=1,port=16020" #85 daemon prio=5 os_prio=0 cpu=0.50ms elapsed=315627.85s allocated=1960B defined_classes=0 tid=0x00007f3da0d851d0 nid=0x62dc in Object.wait()  [0x00007f3d9b989000]
         java.lang.Thread.State: RUNNABLE
      	at org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.toRegionInfo(ProtobufUtil.java:3231)
      	at org.apache.hadoop.hbase.regionserver.RSRpcServices.executeOpenRegionProcedures(RSRpcServices.java:3755)
      	at org.apache.hadoop.hbase.regionserver.RSRpcServices.lambda$executeProcedures$2(RSRpcServices.java:3827)
      	at org.apache.hadoop.hbase.regionserver.RSRpcServices$$Lambda$173/0x00000017c0e40040.accept(Unknown Source)
      	at java.util.ArrayList.forEach(java.base@11.0.6/ArrayList.java:1540)
      	at java.util.Collections$UnmodifiableCollection.forEach(java.base@11.0.6/Collections.java:1085)
      	at org.apache.hadoop.hbase.regionserver.RSRpcServices.executeProcedures(RSRpcServices.java:3827)
      	at org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:34896)
      	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:393)
      	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
      	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
      	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
       

      ... which is here in code

      if (tableName.equals(TableName.META_TABLE_NAME) && replicaId == defaultReplicaId) {
      return RegionInfoBuilder.FIRST_META_REGIONINFO;
      }

       

      The thread dump does not seem to recognize the above as a deadlock.

       

      ...at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3327) is doing the below:

      return this.onlineRegions.get(encodedRegionName);

      ... where onlineRegions is concurrent Map of String to HRegion.

       

       

       

       

      Attachments

        1. hbasedn192-jstack-2.webarchive
          209 kB
          Michael Stack
        2. hbasedn192-jstack-1.webarchive
          208 kB
          Michael Stack
        3. hbasedn192-jstack-0.webarchive
          203 kB
          Michael Stack

        Issue Links

          Activity

            People

              stack Michael Stack
              stack Michael Stack
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: