Description
We noticed a deadlock in PushHttpMetricsReporter. Locking for metrics was changed under KAFKA-6765 to avoid NullPointerException in metrics reporters due to concurrent read and updates. PushHttpMetricsReporter requires a lock to process metrics registration that is invoked while holding the sensor lock. It also reads metrics attempting to acquire sensor lock while holding its lock (inverse order). This resulted in the deadlock below.
Found one Java-level deadlock:
Java stack information for the threads listed above:
===================================================
"StreamThread-7":
at org.apache.kafka.tools.PushHttpMetricsReporter.metricChange(PushHttpMetricsReporter.java:144)
- waiting to lock <0x0000000655a54310> (a java.lang.Object)
at org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:563)- locked <0x0000000655a44a28> (a org.apache.kafka.common.metrics.Metrics)
at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:236)- locked <0x000000065629c170> (a org.apache.kafka.common.metrics.Sensor)
at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:217)
at org.apache.kafka.common.network.Selector$SelectorMetrics.maybeRegisterConnectionMetrics(Selector.java:1016)
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:462)
at org.apache.kafka.common.network.Selector.poll(Selector.java:425)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:510)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:271)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:242)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:218)
at org.apache.kafka.clients.consumer.internals.Fetcher.getTopicMetadata(Fetcher.java:274)
at org.apache.kafka.clients.consumer.internals.Fetcher.getAllTopicMetadata(Fetcher.java:254)
at org.apache.kafka.clients.consumer.KafkaConsumer.listTopics(KafkaConsumer.java:1820)
at org.apache.kafka.clients.consumer.KafkaConsumer.listTopics(KafkaConsumer.java:1798)
at org.apache.kafka.streams.processor.internals.StoreChangelogReader.refreshChangelogInfo(StoreChangelogReader.java:224)
at org.apache.kafka.streams.processor.internals.StoreChangelogReader.initialize(StoreChangelogReader.java:121)
at org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:74)
at org.apache.kafka.streams.processor.internals.TaskManager.updateNewAndRestoringTasks(TaskManager.java:317)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:824)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:767)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:736)"pool-17-thread-1":
at org.apache.kafka.common.metrics.KafkaMetric.measurableValue(KafkaMetric.java:82)
- waiting to lock <0x000000065629c170> (a org.apache.kafka.common.metrics.Sensor)
at org.apache.kafka.common.metrics.KafkaMetric.value(KafkaMetric.java:58)
at org.apache.kafka.tools.PushHttpMetricsReporter$HttpReporter.run(PushHttpMetricsReporter.java:177)- locked <0x0000000655a54310> (a java.lang.Object)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)Found 1 deadlock.
Attachments
Issue Links
- links to