[KAFKA-9140] Consumer gets stuck rejoining the group indefinitely - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: 2.4.0
Fix Version/s: 2.4.0
Component/s: clients, consumer
Labels:
- new-consumer-threading-should-fix

Description

There seems to be a race condition that is now causing a rejoining member to potentially get stuck infinitely initiating a rejoin. The relevant client logs are attached (streams.log.tgz; all others attachments are broker logs), but basically it repeats this message (and nothing else) continuously until killed/shutdown:

[2019-11-05 01:53:54,699] INFO [Consumer clientId=StreamsUpgradeTest-a4c1cff8-7883-49cd-82da-d2cdfc33a2f0-StreamThread-1-consumer, groupId=StreamsUpgradeTest] Generation data was cleared by heartbeat thread. Initiating rejoin. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)

The message that appears was added as part of the bugfix (PR 7460) for this related race condition: ~~KAFKA-8104~~.

This issue was uncovered by the Streams version probing upgrade test, which fails with a varying frequency. Here is the rate of failures for different system test runs so far:

trunk (cooperative): 1/1 and 2/10 failures

2.4 (cooperative) : 0/10 and 1/15 failures

trunk (eager): 0/10 failures

I've kicked off some high-repeat runs to complete overnight and hopefully shed more light.

Note that I have also kicked off runs of both 2.4 and trunk with the PR for ~~KAFKA-8104~~ reverted. Both of them saw 2/10 failures, due to hitting the bug that was fixed by PR 7460. It is therefore unclear whether PR 7460 introduced another or a new race condition/bug, or merely uncovered an existing one that previously would have first failed due to ~~KAFKA-8104~~.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

debug.tgz
05/Nov/19 02:59
5.23 MB
A. Sophie Blee-Goldman
server-start-stdout-stderr.log.tgz
05/Nov/19 02:59
1.97 MB
A. Sophie Blee-Goldman
kafka-data-logs-2.tgz
05/Nov/19 02:59
456 kB
A. Sophie Blee-Goldman
kafka-data-logs-1.tgz
05/Nov/19 02:59
387 kB
A. Sophie Blee-Goldman
info.tgz
05/Nov/19 02:59
8 kB
A. Sophie Blee-Goldman
streams.log.tgz
05/Nov/19 02:59
3.22 MB
A. Sophie Blee-Goldman

Issue Links

is related to

KAFKA-8104 Consumer cannot rejoin to the group after rebalancing

Resolved

links to

GitHub Pull Request #7647

Activity

People

Assignee:: Guozhang Wang

Reporter:: A. Sophie Blee-Goldman

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 05/Nov/19 02:45

Updated:: 11/Feb/22 07:33

Resolved:: 06/Nov/19 18:12