Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-1557

ISR reported by TopicMetadataResponse most of the time doesn't match the Zookeeper information (and the truth)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 0.8.0, 0.8.1
    • 0.8.1.1, 0.8.2.0
    • controller, core, replication
    • OSX 10.9.3, Linux Scientific 6.5
      It actually doesn't seem to matter and appears to be OS-agnostic

    Description

      TL;DR - after a topic is created, and at least one broker in the ISR is restarted, the ISR reported by the TopicMetadataResponse is incorrect.

      Specific steps to repro:

      • Download 0.8.1 Kafka
      • Copy server.properties twice into server1.properties and server2.properties (attached) - basically just ports and log paths changed to allow brokers to co-exist
      • Start zookeper using "sh bin/zookeeper-server-start.sh config/zookeper.properties"
      • Start broker1: 'sh bin/kafka-server-start.sh config/server1.properties"
      • Start broker2: 'sh bin/kafka-server-start.sh config/server2.properties"
      • Create a new topic: "sh bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic test --replication-factor 2 --partitions 3"
      • Examine topic state: "sh bin/kafka-topics.sh --zookeeper localhost:2181 --describe --topic test" - note that all ISRs are of length 2
      • Run the attached Scala code that uses TopicMetadataRequest to exmaine topic state. Observer that all ISRs are of length 2 and match the information output by the script
      • Shut down broker2 (simply hit Cntrl-C in the terminal), wait 5-10 seconds
      • Restart broker 2 using the original command
      • Check the status of the topic again. Observe that the leader for all topics is 0 (as expected), and all ISRs contain both brokers (as expected)
      • Run the attached Scala snippet again.

      EXPECTED:

      • The ISR information are of length 2

      ACTUAL:

      • ALL ISRs contain just broker 0

      NOTE: depending on how long broker 2 was down, sometimes some ISRs will contain the full list, but shutting it down for 15+ secs seem to always yield consistent repro

      Basically it appears that brokers have incorrect ISR information for the metadata cache.
      Our production servers exhibit the same problem - after a topic gets created everything looks fine, but as brokers get restarted, ISR reported by the brokers is wrong, whereas the one in ZK appears to report the truth (it shrinks as brokers get shut down and grows back up after they get restarted)

      I'm not sure if this has wider impact on the functioning of the cluster - bad metadata information is bad - but so far there has been no evidence of that

      Attachments

        1. BrokenKafkaLink.scala
          3 kB
          Oleg Lvovitch
        2. server2.properties
          5 kB
          Oleg Lvovitch
        3. server1.properties
          5 kB
          Oleg Lvovitch

        Issue Links

          Activity

            People

              nehanarkhede Neha Narkhede
              Wallrat2000 Oleg Lvovitch
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: