Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-16545

Cluster topology change may produce false unavailable for queries

    XMLWordPrintableJSON

Details

    Description

      When the coordinator processes a query, it first gets the ReplicationStrategy (RS) from the keyspace to decide the peers to contact. Again, it gets the RS to perform the liveness check for the requested CL.

      The RS is a volatile filed in Keyspace, and it is possible that those 2 getter calls return different RS values in the presence of cluster topology changes, e.g. add a node, etc.

      In such scenario, the check at the second step can throw an unexpected unavailable. From the perspective of the query, the cluster can satisfy the CL.

      We should use a consistent view of RS during the peer selection and CL liveness check. In other word, both steps should reference to the same RS object. It is also more clear and easier to reason about to the clients. Such queries are made before the topology change.

      Attachments

        Issue Links

          Activity

            People

              yifanc Yifan Cai
              yifanc Yifan Cai
              Yifan Cai
              Aleksey Yeschenko, Andres de la Peña
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m