Uploaded image for project: 'Apache Gobblin'
  1. Apache Gobblin
  2. GOBBLIN-1676

Kafka consumer returns 0 records even though there are records to be consumed

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • gobblin-kafka
    • None

    Description

      Mode : Gobblin with map-reduce mode, and extract data from a Kafka source.
      After upgrading to Gobblin 0.16 and Kakfka 1.1 Client, this issue is observed:

      TLDR: Some mappers exit prematurely, reading 0 records, even though there are records to be read.Debugging details:

      I debugged the Kafka extractor  & client code and figured out a couple of things: 

         1. Kafka1Client (and also Kafka9client) uses the Poll() method to consume records from Kafka topic/partition. But the Poll() does not guarantee that it will always return the records (at least according to this Stackoverflow post ).

             2. Kafka extractor at the line linked here immediately exits when the iterator returned from Poll() in step 1 is empty, without even checking if there are more records to be read.

      In steps 1 and 2, there are no retries for Poll(), or if the poll returns no records, there is no special handling to try polling again.

      Possible Fix:
      I could fix this issue temporarily, by adding a retry logic around Poll() in Step 1. I added a retry of 3 times, and the mappers are always getting records in the retries.
      I also see some retry logic was present in Kafka08Client for polling records here.
       

      Attachments

        Activity

          People

            shirshanka Shirshanka Das
            bharos92 Bharath Krishna
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: