[RATIS-2162] When closing leaderState, if the logAppender thread sends a snapshot, a deadlock may occur - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Wish
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.1.0
Fix Version/s: 3.2.0
Component/s: Leader
Labels:
None

Description

This is the reason for the jira 2161 problem.
RATIS-2161 Grpc may spawn many threads - ASF JIRA (apache.org)

1. Old Leader S receives larger term number and convert to follower.
2. LogAppender thread L did not receive the shutdown signal in time due to abnormal triggering of restart
3. S will hold the ‘server’ lock and wait for L to shut down
4. L triggers snapshot sending, calls newSnapshotRequests5. In newSnapshotRequests, L will acquire the ‘server’ lock

This eventually leads to a deadlock, grpc cannot reclaim the thread in time, and then the problem of jira 2161 occurs

stop LogAppender L
close LeaderState |
timeline. --------------------------------------
| ----------------------- logAppender L TimeLine
shutdown | |
LeaderState restart newInstallSnapshotRequests
logAppender

I think it is possible to check the status of raft every time LogAppender is awakened, and close it if it is not currently the leader

In addition, in LeaderStateImpl, there is another concurrency safety issue regarding senderList.
removeSenders, addSenders, stopAll may be accessed by multiple threads.
For example, thread t1 creates a futures array with a size of 3 in stopAll, and then thread t2 calls removeSenders, which may cause out-of-bounds access because future.length is 3, but senders .size () < 3.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

1154_review.patch
25/Sep/24 17:47
11 kB
Tsz-wo Sze
image-2024-09-24-10-43-34-812.png
24/Sep/24 02:43
109 kB
yuuka
image-2024-09-24-10-41-20-140.png
24/Sep/24 02:41
59 kB
yuuka

Issue Links

links to

GitHub Pull Request #1154

Activity

People

Assignee:: yuuka

Reporter:: yuuka

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 24/Sep/24 02:25

Updated:: 30/Sep/24 06:44

Resolved:: 30/Sep/24 06:44

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

1h 40m