Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-10127

kafka cluster not recovering - Shrinking ISR continously

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.4.1
    • None
    • replication, zkclient
    • None
    • using kafka version 2.4.1 and zookeeper version 3.5.7

    Description

      We are actually facing issue from time to time where our kafka cluster goes into a weird state. We see the following log repeating

      [2020-06-06 08:35:48,117] INFO [Partition test broker=1002] Cached zkVersion 620 not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
      [2020-06-06 08:35:48,117] INFO [Partition test broker=1002] Shrinking ISR from 1006,1002 to 1002. Leader: (highWatermark: 3222733572, endOffset: 3222741893). Out of sync replicas: (brokerId: 1006, endOffset: 3222733572). (kafka.cluster.Partition)

       

      Just before that our zookeeper session expired which lead us to that state.

       

      After we increased this two values below we encounter the issue less frequently but it still appears from time to time and the only solution is restart of kafka service on all brokers to recover.

      zookeeper.session.timeout.ms=18000

      replica.lag.time.max.ms=30000

       

      Any thoughts on that please  

      Attachments

        Activity

          People

            Unassigned Unassigned
            ybouzaine Youssef BOUZAIENNE
            Votes:
            2 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: