Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-9213

BufferOverflowException on rolling new segment after upgrading Kafka from 1.1.0 to 2.3.1

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Duplicate
    • 2.3.0, 2.3.1
    • None
    • log
    • None
    • Ubuntu 16.04, AWS instance d2.8xlarge.

      JAVA Options:

      -Xms16G
      -Xmx16G
      -XX:G1HeapRegionSize=16M
      -XX:MetaspaceSize=96m
      -XX:MinMetaspaceFreeRatio=50

    Description

      We updated our Kafka cluster from 1.1.0 version to 2.3.1. We followed up to step 2 of the update instruction.

      Message format and inter-broker protocol versions were left the same:

      inter.broker.protocol.version=1.1

      log.message.format.version=1.1

       

      After upgrading, we started to get some occasional exceptions:

      2019/11/19 05:30:53 INFO [ProducerStateManager
      partition=matchmaker_retry_clicks_15m-2] Writing producer snapshot at
      offset 788532 (kafka.log.ProducerStateManager)
      2019/11/19 05:30:53 INFO [Log partition=matchmaker_retry_clicks_15m-2,
      dir=/mnt/kafka] Rolled new log segment at offset 788532 in 1 ms.
      (kafka.log.Log)
      2019/11/19 05:31:01 ERROR [ReplicaManager broker=0] Error processing append
      operation on partition matchmaker_retry_clicks_15m-2
      (kafka.server.ReplicaManager)
      2019/11/19 05:31:01 java.nio.BufferOverflowException
      2019/11/19 05:31:01     at java.nio.Buffer.nextPutIndex(Buffer.java:527)
      2019/11/19 05:31:01     at
      java.nio.DirectByteBuffer.putLong(DirectByteBuffer.java:797)
      2019/11/19 05:31:01     at
      kafka.log.TimeIndex.$anonfun$maybeAppend$1(TimeIndex.scala:134)
      2019/11/19 05:31:01     at
      scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
      2019/11/19 05:31:01     at
      kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
      2019/11/19 05:31:01     at
      kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:114)
      2019/11/19 05:31:01     at
      kafka.log.LogSegment.onBecomeInactiveSegment(LogSegment.scala:520)
      2019/11/19 05:31:01     at kafka.log.Log.$anonfun$roll$8(Log.scala:1690)
      2019/11/19 05:31:01     at
      kafka.log.Log.$anonfun$roll$8$adapted(Log.scala:1690)
      2019/11/19 05:31:01     at scala.Option.foreach(Option.scala:407)
      2019/11/19 05:31:01     at kafka.log.Log.$anonfun$roll$2(Log.scala:1690)
      2019/11/19 05:31:01     at
      kafka.log.Log.maybeHandleIOException(Log.scala:2085)
      2019/11/19 05:31:01     at kafka.log.Log.roll(Log.scala:1654)
      2019/11/19 05:31:01     at kafka.log.Log.maybeRoll(Log.scala:1639)
      2019/11/19 05:31:01     at kafka.log.Log.$anonfun$append$2(Log.scala:966)
      2019/11/19 05:31:01     at
      kafka.log.Log.maybeHandleIOException(Log.scala:2085)
      2019/11/19 05:31:01     at kafka.log.Log.append(Log.scala:850)
      2019/11/19 05:31:01     at kafka.log.Log.appendAsLeader(Log.scala:819)
      2019/11/19 05:31:01     at
      kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:772)
      2019/11/19 05:31:01     at
      kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
      2019/11/19 05:31:01     at
      kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:259)
      2019/11/19 05:31:01     at
      kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:759)
      2019/11/19 05:31:01     at
      kafka.server.ReplicaManager.$anonfun$appendToLocalLog$2(ReplicaManager.scala:763)
      2019/11/19 05:31:01     at
      scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
      2019/11/19 05:31:01     at
      scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
      2019/11/19 05:31:01     at
      scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
      2019/11/19 05:31:01     at
      scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
      2019/11/19 05:31:01     at
      scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
      2019/11/19 05:31:01     at
      scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
      2019/11/19 05:31:01     at
      scala.collection.TraversableLike.map(TraversableLike.scala:238)
      2019/11/19 05:31:01     at
      scala.collection.TraversableLike.map$(TraversableLike.scala:231)
      2019/11/19 05:31:01     at
      scala.collection.AbstractTraversable.map(Traversable.scala:108)
      2019/11/19 05:31:01     at
      kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:751)
      2019/11/19 05:31:01     at
      kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:492)
      2019/11/19 05:31:01     at
      kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:544)
      2019/11/19 05:31:01     at
      kafka.server.KafkaApis.handle(KafkaApis.scala:113)
      2019/11/19 05:31:01     at
      kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69)
      2019/11/19 05:31:01     at java.lang.Thread.run(Thread.java:748)
      
      

      The error persists until broker gets restarted (or leadership gets moved to another broker).

       

      Brokers config:

      advertised.host.name={{ hostname }}
      port=9092
      
      # Default number of partitions if a value isn't set when the topic is created.
      num.partitions=3
      auto.create.topics.enable=false
      delete.topic.enable=false
      
      # Prevent not in-sync replica to become a leader.
      unclean.leader.election.enable=false
      
      # The number of threads per data directory to be used for log recovery at
      # startup and flushing at shutdown.
      num.recovery.threads.per.data.dir=36
      
      log.flush.interval.messages=10000
      log.flush.interval.ms=2000
      
      # 1 week
      log.retention.hours=168
      log.retention.check.interval.ms=300000
      log.cleaner.enable=false
      
      # Use broker time for message timestamps.
      log.message.timestamp.type=LogAppendTime
      zookeeper.connect={{zookeeper_host }}:2181
      zookeeper.connection.timeout.ms=6000
      
      controller.socket.timeout.ms=30000
      controller.message.queue.size=10
      
      # Replication configuration
      num.replica.fetchers=10
      
      # Socket server configuration
      num.io.threads=32
      num.network.threads=16
      socket.request.max.bytes=104857600
      socket.receive.buffer.bytes=1048576
      socket.send.buffer.bytes=1048576
      queued.max.requests=32
      fetch.purgatory.purge.interval.requests=100
      producer.purgatory.purge.interval.requests=100
      
      inter.broker.protocol.version=1.1
      log.message.format.version=1.1
      

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              Kulakhmetov Daniyar
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: