Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Duplicate
-
2.1.0
-
None
-
None
Description
We have 3 node cluster setup. There are scenarios that one of the broker suddenly got disconnected from the cluster but no underlying system issue is found. The node that got dc'ed wasn't able to release the partition it holds as the leader, hence clients (spring-boot) was unable to send/receive data from the issued broker.
We noticed that it always happen to the active controller count.
Environment details:
Provider: AWS
Kernel: 3.10.0-693.21.1.el7.x86_64
OS: CentOS Linux release 7.5.1804 (Core)
Scala version: 2.11
Kafka version: 2.1.0
Kafka config:
############################# Socket Server Settings #############################
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
############################# Log Basics #############################
num.partitions=1
num.recovery.threads.per.data.dir=1
############################# Internal Topic Settings #############################
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=2
############################# Log Retention Policy #############################
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
############################# Group Coordinator Settings #############################
group.initial.rebalance.delay.ms=0
############################# Zookeeper #############################
zookeeper.connection.timeout.ms=6000
broker.id=1
zookeeper.connect=zk1:2181,zk2:2181,zk3:2181
log.dirs=/data/kafka-node
advertised.listeners=PLAINTEXT://node1:9092
Broker disconnected controller log:
[2019-01-26 05:03:52,512] TRACE [Controller id=2] Checking need to trigger auto leader balancing (kafka.controller.KafkaController) [2019-01-26 05:03:52,513] DEBUG [Controller id=2] Preferred replicas by broker Map(TOPICS->MAP) (kafka.controller.KafkaController) [2019-01-26 05:03:52,513] DEBUG [Controller id=2] Topics not in preferred replica for broker 2 Map() (kafka.controller.KafkaController) [2019-01-26 05:03:52,513] TRACE [Controller id=2] Leader imbalance ratio for broker 2 is 0.0 (kafka.controller.KafkaController) [2019-01-26 05:03:52,513] DEBUG [Controller id=2] Topics not in preferred replica for broker 1 Map() (kafka.controller.KafkaController) [2019-01-26 05:03:52,513] TRACE [Controller id=2] Leader imbalance ratio for broker 1 is 0.0 (kafka.controller.KafkaController) [2019-01-26 05:03:52,513] DEBUG [Controller id=2] Topics not in preferred replica for broker 3 Map() (kafka.controller.KafkaController) [2019-01-26 05:03:52,513] TRACE [Controller id=2] Leader imbalance ratio for broker 3 is 0.0 (kafka.controller.KafkaController) [2019-01-26 05:08:52,513] TRACE [Controller id=2] Checking need to trigger auto leader balancing (kafka.controller.KafkaController)
Broker working server.log:
[2019-01-26 05:02:05,564] INFO [ReplicaFetcher replicaId=3, leaderId=2, fetcherId=0] Error sending fetch request (sessionId=1637095899, epoch=21379644) to node 2: java.io.IOException: Connection to 2 was disconnected before the response was read. (org.apache.kafka.clients.FetchSessionHandler) [2019-01-26 05:02:05,573] WARN [ReplicaFetcher replicaId=3, leaderId=2, fetcherId=0] Error in response for fetch request (type=FetchRequest, replicaId=3, maxWait=500, minBytes=1, maxBytes=10485760, fetchData={PlayerGameRounds-8=(offset=2171960, logStartOffset=1483356, maxBytes=1048576, currentLeaderEpoch=Optional[2])}, isolationLevel=READ_UNCOMMITTED, toForget=, metadata=(sessio nId=1637095899, epoch=21379644)) (kafka.server.ReplicaFetcherThread) java.io.IOException: Connection to 2 was disconnected before the response was read at org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:97) at kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:97) at kafka.server.ReplicaFetcherThread.fetchFromLeader(ReplicaFetcherThread.scala:190) at kafka.server.AbstractFetcherThread.kafka$server$AbstractFetcherThread$$processFetchRequest(AbstractFetcherThread.scala:241) at kafka.server.AbstractFetcherThread$$anonfun$maybeFetch$1.apply(AbstractFetcherThread.scala:130) at kafka.server.AbstractFetcherThread$$anonfun$maybeFetch$1.apply(AbstractFetcherThread.scala:129) at scala.Option.foreach(Option.scala:257) at kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:129) at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:111) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82) [2019-01-26 05:02:35,723] WARN Attempting to send response via channel for which there is no open connection, connection id node3:9092-node2:59988-1550 (kafka.network.Processor) [2019-01-26 05:02:35,731] WARN Attempting to send response via channel for which there is no open connection, connection id node3:9092-node2:59986-1550 (kafka.network.Processor) [2019-01-26 05:02:35,797] WARN Attempting to send response via channel for which there is no open connection, connection id node3:9092-node2:59494-1549 (kafka.network.Processor) [2019-01-26 05:02:35,816] WARN Attempting to send response via channel for which there is no open connection, connection id node3:9092-node2:53268-1530 (kafka.network.Processor) [2019-01-26 05:02:37,603] INFO [ReplicaFetcher replicaId=3, leaderId=2, fetcherId=0] Error sending fetch request (sessionId=1637095899, epoch=INITIAL) to node 2: java.io.IOException: Connection to 2 was disconnected before the response was read. (org.apache.kafka.clients.FetchSessionHandler)
The request handler idle metrics dropped during this issue:
Attachments
Attachments
Issue Links
- is duplicated by
-
KAFKA-7697 Possible deadlock in kafka.cluster.Partition
- Resolved