Uploaded image for project: 'Apache Curator'
  1. Apache Curator
  2. CURATOR-498

Protected Mode creation can mistake closing session's node causing problems for many recipes such as LeaderLatch

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 4.0.1, 4.1.0
    • 4.2.0
    • Framework
    • None
    • ZooKeeper 3.4.13, Curator 4.1.0 (selecting explicitly 3.4.13), Linux

    Description

      The Curator app I am working on uses the LeaderLatch to select a leader out of 6 clients.

      While testing my app, I noticed that when I make ZK lose its quorum for a while and then restore it, then after Curator in my app restores it's connection to ZK - sometimes not all the 6 clients are found in the latch path (using zkCli.sh). That is, I have 5 instead of 6.

      After investigating a little, I have a suspicion that LeaderLatch deleted the leader in method setNode.

      To investigate it I copied the LeaderLatch code and added some log messages, and from them it seems like very old create() background callback was surprisingly scheduled and corrupted the current leader with its stale path name. Meaning, this old one called setNode with its stale name, and set itself instead of the leader and deleted the leader. This leaves client running, thinking it is the leader, while another leader is selected.

      If my analysis is correct then it seems like we need to make this obsolete create callback cancelled (I think its session was suspended on 22:38:54 and then lost on 22:39:04 - so on SUSPENDED cancel ongoing callbacks).

      Please see attached log file and modified LeaderLatch0.

       

      In the log, note that on 22:39:26 it shows that 0000000485 is replaced by 0000000480 and then probably deleted.

      Note also that at 22:38:52, 34 seconds before, we can see that it was in the reset() method ("RESET OUR PATH") and possibly triggered the creation of 0000000480 then.

      Attachments

        1. reproduction2.tar.gz
          996 kB
          Shay Shimony
        2. reproduction.tar.gz
          415 kB
          Shay Shimony
        3. logs.tar.gz
          70 kB
          Shay Shimony
        4. LeaderLatch0.java
          21 kB
          Shay Shimony
        5. HaWatcher.log
          32 kB
          Shay Shimony
        6. ha.tar.gz
          122 kB
          Shay Shimony
        7. CURATOR-498.png
          58 kB
          Jordan Zimmerman

        Issue Links

          Activity

            People

              randgalt Jordan Zimmerman
              shayshim Shay Shimony
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 50m
                  50m