Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-10632

Region lost in limbo after ArrayIndexOutOfBoundsException during assignment

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      Discovered while running IntegrationTestBigLinkedList. Region 24d68aa7239824e42390a77b7212fcbf is scheduled for move from hor13n19 to hor13n13. During the process an exception is thrown.

      2014-02-25 15:30:42,613 INFO  [MASTER_SERVER_OPERATIONS-hor13n12:60000-4] master.RegionStates: Transitioning {24d68aa7239824e42390a77b7212fcbf state=OPENING, ts=1393342207107, server=hor13n19.gq1.ygridcore.net,60020,1393341563552} will be handled by SSH for hor13n19.gq1.ygridcore.net,60020,1393341563552
      2014-02-25 15:30:42,613 INFO  [MASTER_SERVER_OPERATIONS-hor13n12:60000-4] handler.ServerShutdownHandler: Reassigning 7 region(s) that hor13n19.gq1.ygridcore.net,60020,1393341563552 was carrying (and 0 regions(s) that were opening on this server)
      2014-02-25 15:30:42,613 INFO  [MASTER_SERVER_OPERATIONS-hor13n12:60000-4] handler.ServerShutdownHandler: Reassigning region with rs = {24d68aa7239824e42390a77b7212fcbf state=OPENING, ts=1393342207107, server=hor13n19.gq1.ygridcore.net,60020,1393341563552} and deleting zk node if exists
      2014-02-25 15:30:42,623 INFO  [MASTER_SERVER_OPERATIONS-hor13n12:60000-4] master.RegionStates: Transitioned {24d68aa7239824e42390a77b7212fcbf state=OPENING, ts=1393342207107, server=hor13n19.gq1.ygridcore.net,60020,1393341563552} to {24d68aa7239824e42390a77b7212fcbf state=OFFLINE, ts=1393342242623, server=hor13n19.gq1.ygridcore.net,60020,1393341563552}
      2014-02-25 15:30:42,623 DEBUG [AM.ZK.Worker-pool2-t46] master.AssignmentManager: Znode IntegrationTestBigLinkedList,\x80\x06\x1A,1393342105093.24d68aa7239824e42390a77b7212fcbf. deleted, state: {24d68aa7239824e42390a77b7212fcbf state=OFFLINE, ts=1393342242623, server=hor13n19.gq1.ygridcore.net,60020,1393341563552}
      ...
      2014-02-25 15:30:43,993 ERROR [MASTER_SERVER_OPERATIONS-hor13n12:60000-4] executor.EventHandler: Caught throwable while processing event M_SERVER_SHUTDOWN
      java.lang.ArrayIndexOutOfBoundsException: 0
      	at org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.<init>(BaseLoadBalancer.java:250)
      	at org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer.createCluster(BaseLoadBalancer.java:921)
      	at org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer.roundRobinAssignment(BaseLoadBalancer.java:860)
      	at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2482)
      	at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:282)
      	at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:722)
      

      After that, region is left in limbo and is never reassigned.

      2014-02-25 15:35:11,581 INFO  [FifoRpcScheduler.handler1-thread-6] master.HMaster: Client=hrt_qa//68.142.246.29 move hri=IntegrationTestBigLinkedList,\x80\x06\x1A,1393342105093.24d68aa7239824e42390a77b7212fcbf., src=hor13n19.gq1.ygridcore.net,60020,1393341563552, dest=hor13n13.gq1.ygridcore.net,60020,1393342222275, running balancer
      2014-02-25 15:35:11,581 INFO  [FifoRpcScheduler.handler1-thread-6] master.AssignmentManager: Ignored moving region not assigned: {ENCODED => 24d68aa7239824e42390a77b7212fcbf, NAME => 'IntegrationTestBigLinkedList,\x80\x06\x1A,1393342105093.24d68aa7239824e42390a77b7212fcbf.', STARTKEY => '\x80\x06\x1A', ENDKEY => ''}, {24d68aa7239824e42390a77b7212fcbf state=OFFLINE, ts=1393342242623, server=hor13n19.gq1.ygridcore.net,60020,1393341563552}
      ...
      2014-02-25 15:35:26,586 DEBUG [hor13n12.gq1.ygridcore.net,60000,1393341917402-BalancerChore] master.HMaster: Not running balancer because 1 region(s) in transition: {24d68aa7239824e42390a77b7212fcbf={24d68aa7239824e42390a77b7212fcbf state=OFFLINE, ts=1393342242623, server=hor13n19.gq1.ygridcore.net,60020,1393341563552}}
      ...
      2014-02-25 15:35:51,945 DEBUG [FifoRpcScheduler.handler1-thread-16] master.HMaster: Client=hrt_qa//68.142.246.29 unassign IntegrationTestBigLinkedList,\x80\x06\x1A,1393342105093.24d68aa7239824e42390a77b7212fcbf. in current location if it is online and reassign.force=false
      2014-02-25 15:35:51,945 DEBUG [FifoRpcScheduler.handler1-thread-16] master.AssignmentManager: Starting unassign of IntegrationTestBigLinkedList,\x80\x06\x1A,1393342105093.24d68aa7239824e42390a77b7212fcbf. (offlining), current state: {24d68aa7239824e42390a77b7212fcbf state=OFFLINE, ts=1393342242623, server=hor13n19.gq1.ygridcore.net,60020,1393341563552}
      2014-02-25 15:35:51,945 DEBUG [FifoRpcScheduler.handler1-thread-16] master.AssignmentManager: Attempting to unassign IntegrationTestBigLinkedList,\x80\x06\x1A,1393342105093.24d68aa7239824e42390a77b7212fcbf. but it is already in transition (OFFLINE, force=false)
      ...
      2014-02-25 15:40:26,587 DEBUG [hor13n12.gq1.ygridcore.net,60000,1393341917402-BalancerChore] master.HMaster: Not running balancer because 1 region(s) in transition: {24d68aa7239824e42390a77b7212fcbf={24d68aa7239824e42390a77b7212fcbf state=OFFLINE, ts=1393342242623, server=hor13n19.gq1.ygridcore.net,60020,1393341563552}}
      

      Spoke with enis about it earlier, assigning to him.

      Attachments

        1. hbase-10632_v1.patch
          11 kB
          Enis Soztutar

        Issue Links

          Activity

            People

              enis Enis Soztutar
              ndimiduk Nick Dimiduk
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: