Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-8545

Meta stuck in transition when it is assigned to a just restarted dead region sever

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.98.0, 0.95.1
    • Region Assignment
    • None
    • Reviewed

    Description

      Support the meta region server is down, and the SSH tries to re-assign it. This could happen:

      1. AM plans to assign meta to a region server (R_old);
      2. Now R_old is dead, the new region server (R_new) starts up on the same host, port, but gets a different start code;
      3. AM sends the open region request to R_new and the Meta is opened on it;
      4. AM gets ZK event, but it is from a different region server instance (R_new), not the expected one (R_old), so it sends a close region request to R_new;
      5. Now, the meta is stuck in transition and won't be assigned.

      This won't happen to a user region since the SSH for R_old will find out the user region stuck in transition and re-assign it. For meta, it is a little different. AM checks if a dead region server carries the meta based on the ZK info, which is changed to the new region server R_new at step 3 by the open region handler.

      The fix I was thinking about is:
      1. In checking if a region server carries a region, uses the region transition information if it exists (which is the source of truth, to master), if not, checks the ZK data as before;
      2. In open region handler, when transition assign zk node from offline to opening, make sure the current region server is the expected one (ZK#transitionNode, existing code doesn't check the target server name).

      Attachments

        1. trunk-8545.patch
          5 kB
          Jimmy Xiang
        2. trunk-8545_v3.patch
          8 kB
          Jimmy Xiang
        3. trunk-8545_v2.patch
          7 kB
          Jimmy Xiang

        Issue Links

          Activity

            People

              jxiang Jimmy Xiang
              jxiang Jimmy Xiang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: