Description
In CASSANDRA-3936 we added a gossip shutdown announcement. The problem here is that this isn't sufficient; you can still get TOEs and have to wait on the FD to figure things out. This happens due to gossip propagation time and variance; if node X shuts down and sends the message to Y, but Z has a greater gossip version than Y for X and has not yet received the message, it can initiate gossip with Y and thus mark X alive again. I propose quarantining to solve this, however I feel it should be a -D parameter you have to specify, so as not to destroy current dev and test practices, since this will mean a node that shuts down will not be able to restart until the quarantine expires.
Attachments
Attachments
Issue Links
- is related to
-
CASSANDRA-9630 Killing cassandra process results in unclosed connections
- Resolved