Description
While testing Streams in EOS mode under frequent and heavy network partitions, I've encountered the following error, leading to thread death:
[2019-11-26 04:54:02,650] ERROR [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] stream-thread [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] Encountered the following unexpected Kafka exception during processing, this usually indicate Streams internal errors: (org.apache.kafka.streams.processor.internals.StreamThread) org.apache.kafka.streams.errors.StreamsException: stream-thread [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] Failed to rebalance. at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:852) at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:739) at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:698) at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:671) Caused by: org.apache.kafka.streams.errors.StreamsException: stream-thread [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] failed to suspend stream tasks at org.apache.kafka.streams.processor.internals.TaskManager.suspendActiveTasksAndState(TaskManager.java:253) at org.apache.kafka.streams.processor.internals.StreamsRebalanceListener.onPartitionsRevoked(StreamsRebalanceListener.java:116) at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.invokePartitionsRevoked(ConsumerCoordinator.java:291) at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onLeavePrepare(ConsumerCoordinator.java:707) at org.apache.kafka.clients.consumer.KafkaConsumer.unsubscribe(KafkaConsumer.java:1073) at org.apache.kafka.streams.processor.internals.StreamThread.enforceRebalance(StreamThread.java:716) at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:710) ... 1 more Caused by: org.apache.kafka.streams.errors.ProcessorStateException: task [1_1] Failed to flush state store KSTREAM-AGGREGATE-STATE-STORE-0000000007 at org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:279) at org.apache.kafka.streams.processor.internals.AbstractTask.flushState(AbstractTask.java:175) at org.apache.kafka.streams.processor.internals.StreamTask.flushState(StreamTask.java:581) at org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:535) at org.apache.kafka.streams.processor.internals.StreamTask.suspend(StreamTask.java:660) at org.apache.kafka.streams.processor.internals.StreamTask.suspend(StreamTask.java:628) at org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.suspendRunningTasks(AssignedStreamsTasks.java:145) at org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.suspendOrCloseTasks(AssignedStreamsTasks.java:128) at org.apache.kafka.streams.processor.internals.TaskManager.suspendActiveTasksAndState(TaskManager.java:246) ... 7 more Caused by: org.apache.kafka.common.errors.ProducerFencedException: Producer attempted an operation with an old epoch. Either there is a newer producer with the same transactionalId, or the producer's transaction has been expired by the broker. [2019-11-26 04:54:02,650] INFO [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] stream-thread [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] State transition from PARTITIONS_REVOKED to PENDING_SHUTDOWN (org.apache.kafka.streams.processor.internals.StreamThread) [2019-11-26 04:54:02,650] INFO [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] stream-thread [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] Shutting down (org.apache.kafka.streams.processor.internals.StreamThread) [2019-11-26 04:54:02,650] INFO [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] [Consumer clientId=stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2-restore-consumer, groupId=null] Unsubscribed all topics or patterns and assigned partitions (org.apache.kafka.clients.consumer.KafkaConsumer) [2019-11-26 04:54:02,653] INFO [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] stream-thread [stream-soak-test-4290196e-d805-4acd-9f78-b459cc7e99ee-StreamThread-2] State transition from PENDING_SHUTDOWN to DEAD (org.apache.kafka.streams.processor.internals.StreamThread)
Elsewhere in the code, we catch ProducerFencedExceptions and trigger a rebalance instead of killing the thread. It seems like one possible avenue has slipped through the cracks.
Attachments
Issue Links
- is cloned by
-
KAFKA-9268 Follow-on: Streams Threads may die from recoverable errors with EOS enabled
- Open
- is related to
-
KAFKA-9310 StreamThread may die from recoverable UnknownProducerId exception
- Resolved
- links to