Details
-
Bug
-
Status: Resolved
-
Urgent
-
Resolution: Fixed
-
None
-
None
-
$ /etc/alternatives/jre_1.6.0/bin/java -version
java version "1.6.0_23"
Java(TM) SE Runtime Environment (build 1.6.0_23-b05)
Java HotSpot(TM) 64-Bit Server VM (build 19.0-b09, mixed mode)
$ uname -a
Linux hostname 2.6.32-358.2.1.el6.x86_64 #1 SMP Tue Mar 12 14:18:09 CDT 2013 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/redhat-release
Scientific Linux release 6.3 (Carbon)
$ facter | grep ec2
...
ec2_placement => availability_zone=us-east-1d
...
$ rpm -qi cassandra
cassandra-1.2.3-1.el6.cmp1.noarch
(custom built rpm from cassandra tarball distribution)$ /etc/alternatives/jre_1.6.0/bin/java -version java version "1.6.0_23" Java(TM) SE Runtime Environment (build 1.6.0_23-b05) Java HotSpot(TM) 64-Bit Server VM (build 19.0-b09, mixed mode) $ uname -a Linux hostname 2.6.32-358.2.1.el6.x86_64 #1 SMP Tue Mar 12 14:18:09 CDT 2013 x86_64 x86_64 x86_64 GNU/Linux $ cat /etc/redhat-release Scientific Linux release 6.3 (Carbon) $ facter | grep ec2 ... ec2_placement => availability_zone=us-east-1d ... $ rpm -qi cassandra cassandra-1.2.3-1.el6.cmp1.noarch (custom built rpm from cassandra tarball distribution)
-
Critical
Description
I get SSL and snappy compression errors in multiple datacenter setup.
The setup is simple: 3 nodes in AWS east, 3 nodes in Rackspace. I use slightly modified Ec2MultiRegionSnitch in Rackspace (I just added a regex able to parse the Rackspace/Openstack availability zone which happens to be in unusual format).
During nodetool rebuild tests I managed to (consistently) trigger the following error:
2013-03-19 12:42:16.059+0100 [Thread-13] [DEBUG] IncomingTcpConnection.java(79) org.apache.cassandra.net.IncomingTcpConnection: IOException reading from socket; closing java.io.IOException: FAILED_TO_UNCOMPRESS(5) at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78) at org.xerial.snappy.SnappyNative.rawUncompress(Native Method) at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:391) at org.apache.cassandra.io.compress.SnappyCompressor.uncompress(SnappyCompressor.java:93) at org.apache.cassandra.streaming.compress.CompressedInputStream.decompress(CompressedInputStream.java:101) at org.apache.cassandra.streaming.compress.CompressedInputStream.read(CompressedInputStream.java:79) at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:337) at org.apache.cassandra.utils.BytesReadTracker.readUnsignedShort(BytesReadTracker.java:140) at org.apache.cassandra.utils.ByteBufferUtil.readShortLength(ByteBufferUtil.java:361) at org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:371) at org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:160) at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:122) at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:226) at org.apache.cassandra.net.IncomingTcpConnection.handleStream(IncomingTcpConnection.java:166) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)
The exception is raised during DB file download. What is strange is the following:
- the exception is raised only when rebuildig from AWS into Rackspace
- the exception is raised only when all nodes are up and running in AWS (all 3). In other words, if I bootstrap from one or two nodes in AWS, the command succeeds.
Packet-level inspection revealed malformed packets on both ends of communication (the packet is considered malformed on the machine it originates on).
Further investigation raised two more concerns:
- We managed to get another stacktrace when testing the scenario. The exception was raised only once during the tests and was raised when I throttled the inter-datacenter bandwidth to 1Mbps.
java.lang.RuntimeException: javax.net.ssl.SSLException: bad record MAC at com.google.common.base.Throwables.propagate(Throwables.java:160) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) at java.lang.Thread.run(Thread.java:662) Caused by: javax.net.ssl.SSLException: bad record MAC at com.sun.net.ssl.internal.ssl.Alerts.getSSLException(Alerts.java:190) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1649) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1607) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:859) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:755) at com.sun.net.ssl.internal.ssl.AppInputStream.read(AppInputStream.java:75) at org.apache.cassandra.streaming.compress.CompressedInputStream$Reader.runMayThrow(CompressedInputStream.java:151) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ... 1 more
This is pure SSL error with no snappy interference.
- I managed to trigger the exception during nodetool repair tests when replacing dead node with a new one on the aws side, which means the problem is not restricted to the one-way scenario only.
2013-03-27 14:06:03.033+0100 [Thread-137] [INFO] StreamInSession.java(136) org.apache.cassandra.streaming.StreamInSession: Streaming of file /path/to/cassandra/data/ks/cf/ks-cf-ib-2-Data.db sections=3 progress=0/20513 - 0% for org.apache.cassandra.streaming.StreamInSession@14450ae7 failed: requesting a retry. 2013-03-27 14:06:03.033+0100 [Thread-138] [DEBUG] FileUtils.java(110) org.apache.cassandra.io.util.FileUtils: Deleting ks-cf-tmp-ib-98-Data.db 2013-03-27 14:06:03.033+0100 [Thread-138] [DEBUG] FileUtils.java(110) org.apache.cassandra.io.util.FileUtils: Deleting ks-cf-tmp-ib-98-Filter.db 2013-03-27 14:06:03.034+0100 [Thread-138] [DEBUG] FileUtils.java(110) org.apache.cassandra.io.util.FileUtils: Deleting ks-cf-tmp-ib-98-TOC.txt 2013-03-27 14:06:03.034+0100 [Thread-137] [DEBUG] IncomingTcpConnection.java(91) org.apache.cassandra.net.IncomingTcpConnection: IOException reading from socket; closing java.io.IOException: FAILED_TO_UNCOMPRESS(5) at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78) at org.xerial.snappy.SnappyNative.rawUncompress(Native Method) at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:391) at org.apache.cassandra.io.compress.SnappyCompressor.uncompress(SnappyCompressor.java:93) at org.apache.cassandra.streaming.compress.CompressedInputStream.decompress(CompressedInputStream.java:101) at org.apache.cassandra.streaming.compress.CompressedInputStream.read(CompressedInputStream.java:79) at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:320) at org.apache.cassandra.utils.BytesReadTracker.readUnsignedShort(BytesReadTracker.java:140) at org.apache.cassandra.utils.ByteBufferUtil.readShortLength(ByteBufferUtil.java:361) at org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:371) at org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:160) at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:122) at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:238) at org.apache.cassandra.net.IncomingTcpConnection.handleStream(IncomingTcpConnection.java:178) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:78)
Attachments
Attachments
Issue Links
- is duplicated by
-
CASSANDRA-5381 java.io.EOFException exception while executing nodetool repair with compression enabled
- Resolved
- relates to
-
CASSANDRA-5390 Cassandra doesn't respect internode compression settings
- Resolved