[CASSANDRA-16545] Cluster topology change may produce false unavailable for queries - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: 4.0-rc1, 4.0
Component/s: Consistency/Coordination
Labels:
None

Bug Category:
Availability - Unavailable
Severity:
Low
Complexity:
Normal
Discovered By:
Code Inspection
Platform:

All
Impacts:

None
Since Version:

4.0-alpha1
Source Control Link:

https://github.com/apache/cassandra/commit/b915688ea878aaa284f5cedeb799c5f797c4d824
Test and Documentation Plan:

Hide

unit test; ci

Show
unit test; ci

Description

When the coordinator processes a query, it first gets the ReplicationStrategy (RS) from the keyspace to decide the peers to contact. Again, it gets the RS to perform the liveness check for the requested CL.

The RS is a volatile filed in Keyspace, and it is possible that those 2 getter calls return different RS values in the presence of cluster topology changes, e.g. add a node, etc.

In such scenario, the check at the second step can throw an unexpected unavailable. From the perspective of the query, the cluster can satisfy the CL.

We should use a consistent view of RS during the peer selection and CL liveness check. In other word, both steps should reference to the same RS object. It is also more clear and easier to reason about to the clients. Such queries are made before the topology change.

Attachments

Issue Links

links to

GitHub Pull Request #954

Activity

People

Assignee:: Yifan Cai

Reporter:: Yifan Cai

Authors:: Yifan Cai

Reviewers:: Aleksey Yeschenko, Andres de la Peña

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 29/Mar/21 23:09

Updated:: 31/Jul/24 14:52

Resolved:: 15/Apr/21 15:56

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

1h 20m