Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Duplicate
-
2.3.0, 2.3.1
-
None
-
None
-
Ubuntu 16.04, AWS instance d2.8xlarge.
JAVA Options:
-Xms16G
-Xmx16G
-XX:G1HeapRegionSize=16M
-XX:MetaspaceSize=96m
-XX:MinMetaspaceFreeRatio=50
Description
We updated our Kafka cluster from 1.1.0 version to 2.3.1. We followed up to step 2 of the update instruction.
Message format and inter-broker protocol versions were left the same:
inter.broker.protocol.version=1.1
log.message.format.version=1.1
After upgrading, we started to get some occasional exceptions:
2019/11/19 05:30:53 INFO [ProducerStateManager partition=matchmaker_retry_clicks_15m-2] Writing producer snapshot at offset 788532 (kafka.log.ProducerStateManager) 2019/11/19 05:30:53 INFO [Log partition=matchmaker_retry_clicks_15m-2, dir=/mnt/kafka] Rolled new log segment at offset 788532 in 1 ms. (kafka.log.Log) 2019/11/19 05:31:01 ERROR [ReplicaManager broker=0] Error processing append operation on partition matchmaker_retry_clicks_15m-2 (kafka.server.ReplicaManager) 2019/11/19 05:31:01 java.nio.BufferOverflowException 2019/11/19 05:31:01 at java.nio.Buffer.nextPutIndex(Buffer.java:527) 2019/11/19 05:31:01 at java.nio.DirectByteBuffer.putLong(DirectByteBuffer.java:797) 2019/11/19 05:31:01 at kafka.log.TimeIndex.$anonfun$maybeAppend$1(TimeIndex.scala:134) 2019/11/19 05:31:01 at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) 2019/11/19 05:31:01 at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253) 2019/11/19 05:31:01 at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:114) 2019/11/19 05:31:01 at kafka.log.LogSegment.onBecomeInactiveSegment(LogSegment.scala:520) 2019/11/19 05:31:01 at kafka.log.Log.$anonfun$roll$8(Log.scala:1690) 2019/11/19 05:31:01 at kafka.log.Log.$anonfun$roll$8$adapted(Log.scala:1690) 2019/11/19 05:31:01 at scala.Option.foreach(Option.scala:407) 2019/11/19 05:31:01 at kafka.log.Log.$anonfun$roll$2(Log.scala:1690) 2019/11/19 05:31:01 at kafka.log.Log.maybeHandleIOException(Log.scala:2085) 2019/11/19 05:31:01 at kafka.log.Log.roll(Log.scala:1654) 2019/11/19 05:31:01 at kafka.log.Log.maybeRoll(Log.scala:1639) 2019/11/19 05:31:01 at kafka.log.Log.$anonfun$append$2(Log.scala:966) 2019/11/19 05:31:01 at kafka.log.Log.maybeHandleIOException(Log.scala:2085) 2019/11/19 05:31:01 at kafka.log.Log.append(Log.scala:850) 2019/11/19 05:31:01 at kafka.log.Log.appendAsLeader(Log.scala:819) 2019/11/19 05:31:01 at kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:772) 2019/11/19 05:31:01 at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253) 2019/11/19 05:31:01 at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:259) 2019/11/19 05:31:01 at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:759) 2019/11/19 05:31:01 at kafka.server.ReplicaManager.$anonfun$appendToLocalLog$2(ReplicaManager.scala:763) 2019/11/19 05:31:01 at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) 2019/11/19 05:31:01 at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149) 2019/11/19 05:31:01 at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) 2019/11/19 05:31:01 at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) 2019/11/19 05:31:01 at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) 2019/11/19 05:31:01 at scala.collection.mutable.HashMap.foreach(HashMap.scala:149) 2019/11/19 05:31:01 at scala.collection.TraversableLike.map(TraversableLike.scala:238) 2019/11/19 05:31:01 at scala.collection.TraversableLike.map$(TraversableLike.scala:231) 2019/11/19 05:31:01 at scala.collection.AbstractTraversable.map(Traversable.scala:108) 2019/11/19 05:31:01 at kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:751) 2019/11/19 05:31:01 at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:492) 2019/11/19 05:31:01 at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:544) 2019/11/19 05:31:01 at kafka.server.KafkaApis.handle(KafkaApis.scala:113) 2019/11/19 05:31:01 at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69) 2019/11/19 05:31:01 at java.lang.Thread.run(Thread.java:748)
The error persists until broker gets restarted (or leadership gets moved to another broker).
Brokers config:
advertised.host.name={{ hostname }} port=9092 # Default number of partitions if a value isn't set when the topic is created. num.partitions=3 auto.create.topics.enable=false delete.topic.enable=false # Prevent not in-sync replica to become a leader. unclean.leader.election.enable=false # The number of threads per data directory to be used for log recovery at # startup and flushing at shutdown. num.recovery.threads.per.data.dir=36 log.flush.interval.messages=10000 log.flush.interval.ms=2000 # 1 week log.retention.hours=168 log.retention.check.interval.ms=300000 log.cleaner.enable=false # Use broker time for message timestamps. log.message.timestamp.type=LogAppendTime zookeeper.connect={{zookeeper_host }}:2181 zookeeper.connection.timeout.ms=6000 controller.socket.timeout.ms=30000 controller.message.queue.size=10 # Replication configuration num.replica.fetchers=10 # Socket server configuration num.io.threads=32 num.network.threads=16 socket.request.max.bytes=104857600 socket.receive.buffer.bytes=1048576 socket.send.buffer.bytes=1048576 queued.max.requests=32 fetch.purgatory.purge.interval.requests=100 producer.purgatory.purge.interval.requests=100 inter.broker.protocol.version=1.1 log.message.format.version=1.1
Attachments
Issue Links
- is caused by
-
KAFKA-9156 LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow in concurrent state
- Resolved