Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-9231

Streams Threads may die from recoverable errors with EOS enabled

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 2.4.0
    • 2.4.0
    • streams
    • None

    Description

      While testing Streams in EOS mode under frequent and heavy network partitions, I've encountered the following error, leading to thread death:

      [2019-11-26 04:54:02,650] ERROR [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] stream-thread [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] Encountered the following unexpected Kafka exception during processing, this usually indicate Streams internal errors: (org.apache.kafka.streams.processor.internals.StreamThread)
      org.apache.kafka.streams.errors.StreamsException: stream-thread [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] Failed to rebalance.
      	at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:852)
      	at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:739)
      	at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:698)
      	at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:671)
      Caused by: org.apache.kafka.streams.errors.StreamsException: stream-thread [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] failed to suspend stream tasks
      	at org.apache.kafka.streams.processor.internals.TaskManager.suspendActiveTasksAndState(TaskManager.java:253)
      	at org.apache.kafka.streams.processor.internals.StreamsRebalanceListener.onPartitionsRevoked(StreamsRebalanceListener.java:116)
      	at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.invokePartitionsRevoked(ConsumerCoordinator.java:291)
      	at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onLeavePrepare(ConsumerCoordinator.java:707)
      	at org.apache.kafka.clients.consumer.KafkaConsumer.unsubscribe(KafkaConsumer.java:1073)
      	at org.apache.kafka.streams.processor.internals.StreamThread.enforceRebalance(StreamThread.java:716)
      	at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:710)
      	... 1 more
      Caused by: org.apache.kafka.streams.errors.ProcessorStateException: task [1_1] Failed to flush state store KSTREAM-AGGREGATE-STATE-STORE-0000000007
      	at org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:279)
      	at org.apache.kafka.streams.processor.internals.AbstractTask.flushState(AbstractTask.java:175)
      	at org.apache.kafka.streams.processor.internals.StreamTask.flushState(StreamTask.java:581)
      	at org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:535)
      	at org.apache.kafka.streams.processor.internals.StreamTask.suspend(StreamTask.java:660)
      	at org.apache.kafka.streams.processor.internals.StreamTask.suspend(StreamTask.java:628)
      	at org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.suspendRunningTasks(AssignedStreamsTasks.java:145)
      	at org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.suspendOrCloseTasks(AssignedStreamsTasks.java:128)
      	at org.apache.kafka.streams.processor.internals.TaskManager.suspendActiveTasksAndState(TaskManager.java:246)
      	... 7 more
      Caused by: org.apache.kafka.common.errors.ProducerFencedException: Producer attempted an operation with an old epoch. Either there is a newer producer with the same transactionalId, or the producer's transaction has been expired by the broker.
      [2019-11-26 04:54:02,650] INFO [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] stream-thread [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] State transition from PARTITIONS_REVOKED to PENDING_SHUTDOWN (org.apache.kafka.streams.processor.internals.StreamThread)
      [2019-11-26 04:54:02,650] INFO [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] stream-thread [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] Shutting down (org.apache.kafka.streams.processor.internals.StreamThread)
      [2019-11-26 04:54:02,650] INFO [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] [Consumer clientId=stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2-restore-consumer, groupId=null] Unsubscribed all topics or patterns and assigned partitions (org.apache.kafka.clients.consumer.KafkaConsumer)
      [2019-11-26 04:54:02,653] INFO [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] stream-thread [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] State transition from PENDING_SHUTDOWN to DEAD (org.apache.kafka.streams.processor.internals.StreamThread)
      

      Elsewhere in the code, we catch ProducerFencedExceptions and trigger a rebalance instead of killing the thread. It seems like one possible avenue has slipped through the cracks.

      Attachments

        Issue Links

          Activity

            People

              vvcephei John Roesler
              vvcephei John Roesler
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: