Details
-
Bug
-
Status: Resolved
-
Urgent
-
Resolution: Fixed
-
4.0.11, 4.1.3, 5.0-alpha1, 5.0
-
None
-
Correctness - Transient Incorrect Response
-
Critical
-
Normal
-
User Report
-
All
-
None
Description
After installing a new node using 4.0.10 we experienced a situation where the new node attempted to connect to the private ip of a random number of nodes remote DCs which are only accessible via public ip for cross dc communications.
The only impact was new nodes outbound connections, inbound from pre-4.0.10 were not affected. system.peers_v2 (below) showed that the preferred_ip and preferred_port as null, only those in 4.0.10 nodes dc have perferred_ip values as expected.
We believe the issue originated with https://issues.apache.org/jira/browse/CASSANDRA-16718
Details on cluster:
- All nodes have public IP configured as well as private IP
- Listen/rpc addressrs are configured for private ip, broadcast is public IP
- prefer_local=true is enabled for all nodes
The log that showed the connection failing:
INFO [Messaging-EventLoop-3-8] 2023-06-01 00:14:21,565 NoSpamLogger.java:92 - /99.81.<redacted>:7000->/44.208.<redacted>:7000-URGENT_MESSAGES-[no-channel] failed to connectio.netty.channel.ConnectTimeoutException: connection timed out: /10.26.5.11:7000 at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:576)
99 and 44 instances can only access each other using public ips.
gossipinfo output from 4.0.10 node
/44.208.<redacted>
generation:1661113358
heartbeat:25267691
LOAD:25267683:1.7882044268E10
SCHEMA:24692061:e98b918d-499f-3ccc-8dbe-5af31f685bda
DC:13:us-east-1
RACK:15:1a
RELEASE_VERSION:6:4.0.5
NET_VERSION:2:12
HOST_ID:3:9a41e668-060d-4cfe-bb1e-013f5116422d
RPC_READY:1407:true
INTERNAL_ADDRESS_AND_PORT:9:10.26.5.11:7000
NATIVE_ADDRESS_AND_PORT:4:44.208.<redacted>:9042
STATUS_WITH_PORT:1393:NORMAL,-2262036356854762881
SSTABLE_VERSIONS:7:big-nb
TOKENS:1392:<hidden>
Peers output from 4.0.10 node:
peer | peer_port | data_center | host_id | native_address | native_port | preferred_ip | preferred_port | rack | release_version | schema_version | tokens----------------+-----------+---------------------+--------------------------------------+----------------+-------------+--------------+----------------+------+-----------------+--------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 44.208.<redacted> | 7000 | us-east-1 | 9a41e668-060d-4cfe-bb1e-013f5116422d | 44.208.<redacted> | 9042 | null | null | 1a | 4.0.5 | e98b918d-499f-3ccc-8dbe-5af31f685bda | {'-2262036356854762881', '-4197710115038136897', '-7072386316096662315', '2085255826742630980', '249732489387853170', '4976300208126705818', '7187184456885833289', '8777189009399731927'}
To solve temporarily we routed outbound traffic to the private ip to public using iptables which resulted in successful outbound connections.
Attachments
Issue Links
- is caused by
-
CASSANDRA-16718 Changing listen_address with prefer_local may lead to issues
- Resolved