Uploaded image for project: 'Apache Curator'
  1. Apache Curator
  2. CURATOR-525

There is a race condition in Curator which might lead to fake SUSPENDED event and ruin CuratorFrameworkImpl inner state

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 4.2.0
    • 5.0.0
    • Framework
    • None

    Description

      This was originally found in the 2.11.1 version of Curator, but I tested the latest release as well, and the issue is still there.

      The issue is tied to guaranteed deletes and how it loops infinitely, if called when there is no connection:

      client.delete().guaranteed().forPath(ourPath); 

      https://curator.apache.org/apidocs/org/apache/curator/framework/api/GuaranteeableDeletable.html

      This schedules a background operation which attempts to remove the node in infinite loop. Each time a background operation fails due to connection loss it performs a check (validateConnection() function) to see if the main thread is already aware of connection loss, and if it's not - raises the connection loss event. The problem is that this peace of code is also executed by the event watcher thread when connection events are happening - this leads to race condition. So when connection is restored it's easily possible for the main thread to raise RECONNECTED event and after that for background thread to raise SUSPENDED event.

      We might get unlucky and get a "phantom" SUSPENDED event. It breaks Curator inner Connection state and leads to curator behaving unpredictably

      Attached some illustrations and Unit test to reproduce the issue. (Put debug point in validateConnection() )

      Possible solution: in CuratorFrameworkImpl class adjust the processEvent() function and add the following:

      if(event.getType() == CuratorEventType.SYNC)

      { connectionStateManager.addStateChange(ConnectionState.RECONNECTED); }

      If this is a same state as before - it will be ignored, if background operation succeeded, but we are in SUSPENDED state - this would repair the Curator state and raise RECONNECTED event.

       

      Attachments

        1. CuratorFrameworkTest.java
          2 kB
          Mikhail Valiev
        2. background-thread-infinite-loop.png
          109 kB
          Mikhail Valiev
        3. event-watcher-thread.png
          65 kB
          Mikhail Valiev
        4. curator-race-condition.png
          39 kB
          Mikhail Valiev

        Issue Links

          Activity

            People

              randgalt Jordan Zimmerman
              mikhailvaliev Mikhail Valiev
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 50m
                  1h 50m