[CASSANDRA-8336] Add shutdown gossip state to prevent timeouts during rolling restarts - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: 2.0.15, 2.1.5
Component/s: None
Labels:
None

Severity:
Normal

Description

In ~~CASSANDRA-3936~~ we added a gossip shutdown announcement. The problem here is that this isn't sufficient; you can still get TOEs and have to wait on the FD to figure things out. This happens due to gossip propagation time and variance; if node X shuts down and sends the message to Y, but Z has a greater gossip version than Y for X and has not yet received the message, it can initiate gossip with Y and thus mark X alive again. I propose quarantining to solve this, however I feel it should be a -D parameter you have to specify, so as not to destroy current dev and test practices, since this will mean a node that shuts down will not be able to restart until the quarantine expires.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

8366-v5.txt
13/Apr/15 18:21
12 kB
Brandon Williams
8336-v4.txt
31/Mar/15 19:02
11 kB
Brandon Williams
8336-v3.txt
03/Feb/15 22:38
10 kB
Brandon Williams
8336-v2.txt
22/Jan/15 21:33
8 kB
Brandon Williams
8336.txt
20/Jan/15 21:26
4 kB
Brandon Williams

Issue Links

is related to

CASSANDRA-9630 Killing cassandra process results in unclosed connections

Resolved

Activity

People

Assignee:: Brandon Williams

Reporter:: Brandon Williams

Authors:: Brandon Williams

Reviewers:: Richard Low

Tester:: Philip Thompson

Votes:: 1 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 18/Nov/14 20:02

Updated:: 16/Apr/19 09:31

Resolved:: 15/Apr/15 14:36