Details
-
Bug
-
Status: Resolved
-
Normal
-
Resolution: Not A Problem
-
None
-
None
-
Unix, Cassandra 2.0.3
-
Normal
Description
We have a Geo-red setup with 2 Data centers having 3 nodes each. When we bring down a single Cassandra node down in DC2 by kill -9 <Cassandra-pid>, reads fail on DC1 with TimedOutException for a brief amount of time (15-20 sec~).
Questions:
1. We need to understand why reads fail on DC1 when a node in another DC i.e. DC2 fails? As we are using LOCAL_QUORUM for both reads/writes in DC1, request should return once 2 nodes in local DC have replied instead of timing out because of node in remote DC.
2. We want to make sure that no Cassandra requests fail in case of node failures. We used rapid read protection of ALWAYS/99percentile/10ms as mentioned in http://www.datastax.com/dev/blog/rapid-read-protection-in-cassandra-2-0-2. But nothing worked. How to ensure zero request failures in case a node fails?
3. What is the right way of handling HTimedOutException exceptions in Hector?
4. Please confirm are we using public private hostnames as expected?
We are using Cassandra 2.0.3.