[CURATOR-504] Race conditions in LeaderLatch after reconnecting to ensemble - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 4.1.0
Fix Version/s: 5.4.0
Component/s: None
Labels:
None

Description

We use LeaderLatch in a lot of places in our system and when ZooKeeper ensemble is unstable and clients are reconnecting to logs are full of messages like the following:

[2017-08-31 19:18:34,562][ERROR][org.apache.curator.framework.recipes.leader.LeaderLatch] Can't find our node. Resetting. Index: -1 {}

According to the implementation, this can happen in two cases:

When internal state `ourPath` is null
When the list of latches does not have the expected one.

I believe we hit the first condition because of races that occur after client reconnects to ZooKeeper.

Client reconnects to ZooKeeper and LeaderLatch gets the event and calls reset method which set the internal state (`ourPath`) to null, removes old latch and creates a new one. This happens in thread "Curator-ConnectionStateManager-0".
Almost simultaneously, LeaderLatch gets another even NodeDeleted (here) and tries to re-read the list of latches and check leadership. This happens in the thread "main-EventThread".

Therefore, sometimes there is a situation when method `checkLeadership` is called when `ourPath` is null.

Below is an approximate diagram of what happens:

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

51868597-65791000-231c-11e9-9bfa-1def62bc3ea1.png
31/Jan/19 21:50
35 kB
Yuri Tceretian
Screen Shot 2019-01-31 at 10.26.59 PM.png
01/Feb/19 03:27
38 kB
Jordan Zimmerman
XP91JuD048Nl_8h9NZpH01QZJMfCLewjfd2eQNfOsR6GuApPNV.png
05/Feb/19 20:14
32 kB
Yuri Tceretian

Issue Links

is cloned by

CURATOR-644 CLONE - Race conditions in LeaderLatch after reconnecting to ensemble

Closed

is related to

CURATOR-505 A circuit breaking ConnectionStateListener would be very helpful

Resolved

Activity

People

Assignee:: Zili Chen

Reporter:: Yuri Tceretian

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 31/Jan/19 21:51

Updated:: 16/Aug/23 09:43

Resolved:: 16/Aug/23 09:43