[CASSANDRA-8352] Timeout Exception on Node Failure in Remote Data Center - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Normal
Resolution: Not A Problem
Fix Version/s: None
Component/s: None
Labels:
- DataCenter
- GEO-Red
Environment:

Unix, Cassandra 2.0.3

Severity:
Normal
Since Version:

2.0.3

Description

We have a Geo-red setup with 2 Data centers having 3 nodes each. When we bring down a single Cassandra node down in DC2 by kill -9 <Cassandra-pid>, reads fail on DC1 with TimedOutException for a brief amount of time (15-20 sec~).

Questions:
1. We need to understand why reads fail on DC1 when a node in another DC i.e. DC2 fails? As we are using LOCAL_QUORUM for both reads/writes in DC1, request should return once 2 nodes in local DC have replied instead of timing out because of node in remote DC.
2. We want to make sure that no Cassandra requests fail in case of node failures. We used rapid read protection of ALWAYS/99percentile/10ms as mentioned in http://www.datastax.com/dev/blog/rapid-read-protection-in-cassandra-2-0-2. But nothing worked. How to ensure zero request failures in case a node fails?
3. What is the right way of handling HTimedOutException exceptions in Hector?
4. Please confirm are we using public private hostnames as expected?

We are using Cassandra 2.0.3.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Akhtar Hussain

Reviewers:: Anuj Wadehra

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 21/Nov/14 05:49

Updated:: 16/Apr/19 09:31

Resolved:: 26/Nov/14 06:20