Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.1.0
-
None
-
None
-
using kafka_2.12-2.1.0
3 ZKs 3 Broker cluster, using 3 boxes (1 ZK and 1 broker on each box), default.replication factor: 2,
offset replication factor was 1 when the error happened, increased to 2 after seeing this error by reassigning-partitions.
compression: default (producer) on broker but sending gzip from producers.
linux (redhat) etx4 kafka logs on single local diskusing kafka_2.12-2.1.0 3 ZKs 3 Broker cluster, using 3 boxes (1 ZK and 1 broker on each box), default.replication factor: 2, offset replication factor was 1 when the error happened, increased to 2 after seeing this error by reassigning-partitions. compression: default (producer) on broker but sending gzip from producers. linux (redhat) etx4 kafka logs on single local disk
Description
we're seeing the following repeating logs on our kafka cluster from time to time which seems to cause messages expiring on Producers and the cluster going into a non-recoverable state. The only fix seems to be to restart brokers.
Shrinking ISR from 14,13 to 13 (kafka.cluster.Partition)
Cached zkVersion [21] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
and later on the following log is repeated:
Got user-level KeeperException when processing sessionid:0xe046aa4f8e60000 type:setData cxid:0x2df zxid:0xa000001fd txntype:-1 reqpath:n/a Error Path:/brokers/topics/ucTrade/partitions/6/state Error:KeeperErrorCode = BadVersion for /brokers/topics/ucTrade/partitions/6/state
We haven't interfered with any of the brokers/zookeepers whilst this happened.
I've attached a combined log which represents a combination of controller, server and state change logs from each broker (ids 13,14 and 15, log files have the suffix b13, b14, b15 respectively)
We have increased the heaps from 1g to 6g for the brokers and from 512m to 4g for the zookeepers since this happened but not sure if it is relevant. the ZK logs are unfortunately overwritten so can't provide those.
We produce varying message sizes but some messages are relatively large (6mb) but we use compression on the producers (set to gzip).
I've attached some logs from one of our producers as well.
producer.properties that we've changed:
spring.kafka.producer.key-serializer=org.apache.kafka.common.serialization.StringSerializer
spring.kafka.producer.compression-type=gzip
spring.kafka.producer.retries=5
spring.kafka.producer.acks=-1
spring.kafka.producer.batch-size=1048576
spring.kafka.producer.properties.linger.ms=200
spring.kafka.producer.properties.request.timeout.ms=600000
spring.kafka.producer.properties.max.block.ms=240000
spring.kafka.producer.properties.max.request.size=104857600