Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
0.9.0.0
-
None
-
None
Description
Hi,
I am using github.com/Shopify/sarama to retrieve the committed offsets for a high-volume topic, but the bug seems to be actually originating in Kafka itself.
I have written a little test to query the offsets of all partitions of one topic, every second. The request looks like this:
OffsetFetchRequest{ ConsumerGroup: "my-group-name", Version: 1, TopicPartitions: []TopicPartition{ {TopicName: "logs", Partitions: []int32{0,1,2,3,4,5,6,7} } }
For most of the time, the responses are correct, but every 10 minutes or so, there is a little glitch. I am not familiar with the Kafka internals, but it looks like a little race. Here's my log output:
... 2016/02/19 09:48:10 topic=logs partition=00 error=0 offset=206567925 2016/02/19 09:48:10 topic=logs partition=01 error=0 offset=206671019 2016/02/19 09:48:10 topic=logs partition=02 error=0 offset=206567995 2016/02/19 09:48:10 topic=logs partition=03 error=0 offset=205785315 2016/02/19 09:48:10 topic=logs partition=04 error=0 offset=206526677 2016/02/19 09:48:10 topic=logs partition=05 error=0 offset=206713764 2016/02/19 09:48:10 topic=logs partition=06 error=0 offset=206524006 2016/02/19 09:48:10 topic=logs partition=07 error=0 offset=206629121 2016/02/19 09:48:11 topic=logs partition=00 error=0 offset=206572870 2016/02/19 09:48:11 topic=logs partition=01 error=0 offset=206675966 2016/02/19 09:48:11 topic=logs partition=02 error=0 offset=206573267 2016/02/19 09:48:11 topic=logs partition=03 error=0 offset=205790613 2016/02/19 09:48:11 topic=logs partition=04 error=0 offset=206531841 2016/02/19 09:48:11 topic=logs partition=05 error=0 offset=206718513 2016/02/19 09:48:11 topic=logs partition=06 error=0 offset=206529762 2016/02/19 09:48:11 topic=logs partition=07 error=0 offset=206634037 2016/02/19 09:48:12 topic=logs partition=00 error=0 offset=-1 2016/02/19 09:48:12 topic=logs partition=01 error=0 offset=-1 2016/02/19 09:48:12 topic=logs partition=02 error=0 offset=-1 2016/02/19 09:48:12 topic=logs partition=03 error=0 offset=-1 2016/02/19 09:48:12 topic=logs partition=04 error=0 offset=-1 2016/02/19 09:48:12 topic=logs partition=05 error=0 offset=-1 2016/02/19 09:48:12 topic=logs partition=06 error=0 offset=-1 2016/02/19 09:48:12 topic=logs partition=07 error=0 offset=-1 2016/02/19 09:48:13 topic=logs partition=00 error=0 offset=-1 2016/02/19 09:48:13 topic=logs partition=01 error=0 offset=206686020 2016/02/19 09:48:13 topic=logs partition=02 error=0 offset=206583861 2016/02/19 09:48:13 topic=logs partition=03 error=0 offset=205800480 2016/02/19 09:48:13 topic=logs partition=04 error=0 offset=206542733 2016/02/19 09:48:13 topic=logs partition=05 error=0 offset=206728251 2016/02/19 09:48:13 topic=logs partition=06 error=0 offset=206534794 2016/02/19 09:48:13 topic=logs partition=07 error=0 offset=206643853 2016/02/19 09:48:14 topic=logs partition=00 error=0 offset=206584533 2016/02/19 09:48:14 topic=logs partition=01 error=0 offset=206690275 2016/02/19 09:48:14 topic=logs partition=02 error=0 offset=206588902 2016/02/19 09:48:14 topic=logs partition=03 error=0 offset=205805413 2016/02/19 09:48:14 topic=logs partition=04 error=0 offset=206542733 2016/02/19 09:48:14 topic=logs partition=05 error=0 offset=206733144 2016/02/19 09:48:14 topic=logs partition=06 error=0 offset=206540275 2016/02/19 09:48:14 topic=logs partition=07 error=0 offset=206649392 ...
As you can see, the returned error code is 0 and there is no obvious reason why the returned offsets are suddenly wrong/blank.
I have also added some debugging to our offset committer to make absolutely sure the numbers we are sending are absolutely correct and they are.
Any help is greatly appreciated!