Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-11037

Race condition in TestZKBasedOpenCloseRegion

    XMLWordPrintableJSON

Details

    • Test
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.99.0, 0.94.19, 0.98.2, 0.96.3
    • None
    • None
    • Reviewed

    Description

      testCloseRegion is called before testReOpenRegion.

      Here's the sequence of events:

      2014-04-18 20:58:05,645 INFO  [Thread-380] master.TestZKBasedOpenCloseRegion(313): Running testCloseRegion
      2014-04-18 20:58:05,645 INFO  [Thread-380] master.TestZKBasedOpenCloseRegion(315): Number of region servers = 2
      2014-04-18 20:58:05,645 INFO  [Thread-380] master.TestZKBasedOpenCloseRegion(164): -ROOT-,,0.70236052
      2014-04-18 20:58:05,646 DEBUG [Thread-380] master.TestZKBasedOpenCloseRegion(320): Asking RS to close region -ROOT-,,0.70236052
      ...
      2014-04-18 20:58:06,237 INFO  [RS_CLOSE_ROOT-hemera.apache.org,46533,1397854669633-0] regionserver.HRegion(1148): Closed -ROOT-,,0.70236052
      ...
      2014-04-18 20:58:06,404 INFO  [Thread-380] master.TestZKBasedOpenCloseRegion(333): Done with testCloseRegion
      

      Then

      2014-04-18 20:58:06,431 INFO  [pool-1-thread-1] hbase.ResourceChecker(157): before master.TestZKBasedOpenCloseRegion#testReOpenRegion: 234 threads, 388 file descriptors 4 connections, 
      ...
      2014-04-18 20:58:06,466 DEBUG [MASTER_OPEN_REGION-hemera.apache.org,52650,1397854669138-3] zookeeper.ZKUtil(1597): master:52650-0x14576a1835d0000 Retrieved 62 byte(s) of data from znode /hbase/unassigned/70236052; data=region=-ROOT-,,0, origin=hemera.apache.org,46533,1397854669633, state=RS_ZK_REGION_OPENED
      2014-04-18 20:58:06,473 DEBUG [pool-1-thread-1] client.ClientScanner(191): Finished with scanning at {NAME => '.META.,,1', STARTKEY => '', ENDKEY => '', ENCODED => 1028785192,}
      2014-04-18 20:58:06,473 INFO  [Thread-396] master.TestZKBasedOpenCloseRegion(123): Number of region servers = 2
      2014-04-18 20:58:06,474 INFO  [Thread-396] master.TestZKBasedOpenCloseRegion(164): -ROOT-,,0.70236052
      2014-04-18 20:58:06,474 DEBUG [Thread-396] master.TestZKBasedOpenCloseRegion(130): Asking RS to close region -ROOT-,,0.70236052
      2014-04-18 20:58:06,474 INFO  [Thread-396] master.TestZKBasedOpenCloseRegion(147): Unassign -ROOT-,,0.70236052
      2014-04-18 20:58:06,474 DEBUG [Thread-396] master.AssignmentManager(2126): Starting unassignment of region -ROOT-,,0.70236052 (offlining)
      2014-04-18 20:58:06,475 DEBUG [Thread-396] master.AssignmentManager(2132): Attempted to unassign region -ROOT-,,0.70236052 but it is not currently assigned anywhere
      2014-04-18 20:58:06,478 DEBUG [pool-1-thread-1-EventThread] zookeeper.ZooKeeperWatcher(294): master:52650-0x14576a1835d0000 Received ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/hbase/unassigned/70236052
      2014-04-18 20:58:06,478 DEBUG [pool-1-thread-1-EventThread] master.AssignmentManager(1176): The znode of region -ROOT-,,0.70236052 has been deleted.
      2014-04-18 20:58:06,478 INFO  [pool-1-thread-1-EventThread] master.AssignmentManager(1188): The master has opened the region -ROOT-,,0.70236052 that was online on hemera.apache.org,46533,1397854669633
      2014-04-18 20:58:06,478 DEBUG [pool-1-thread-1-EventThread] zookeeper.ZooKeeperWatcher(294): master:52650-0x14576a1835d0000 Received ZooKeeper Event, type=NodeChildrenChanged, state=SyncConnected, path=/hbase/unassigned
      

      Then nothing happens. So testCloseRegion unassigns the ROOT region and testReOpenRegion starts before ROOT is reassigned. Hence it waits forever for the close event, since it never happens.

      This is the key "master.AssignmentManager(2132): Attempted to unassign region ROOT,,0.70236052 but it is not currently assigned anywhere"

      The easiest fix is to just run testCloseRegion last (as it was before we switched junit).

      Attachments

        1. 11037-0.98.txt
          2 kB
          Lars Hofhansl
        2. 11037-trunk.txt
          2 kB
          Lars Hofhansl
        3. 11037-0.94.txt
          4 kB
          Lars Hofhansl

        Activity

          People

            larsh Lars Hofhansl
            larsh Lars Hofhansl
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: