Description
We're currently seeing an issue with the java client (producer), when message producing runs into a timeout. Namely a NETWORK_EXCEPTION is thrown instead of a timeout exception.
Situation and relevant code:
Config
request.timeout.ms: 200 retries: 3 acks: all
for (UnpublishedEvent event : unpublishedEvents) { ListenableFuture<SendResult<String, String>> future; future = kafkaTemplate.send(new ProducerRecord<>(event.getTopic(), event.getKafkaKey(), event.getPayload())); futures.add(future.completable()); } CompletableFuture.allOf(futures.stream().toArray(CompletableFuture[]::new)).join();
We're using the KafkaTemplate from SpringBoot here, but it shouldn't matter, as it's merely a wrapper. There we put in batches of messages to be sent.
200ms later, we can see the following in the logs: (not sure about the order, they've arrived in the same ms, so our logging system might not display them in the right order)
[Producer clientId=producer-1] Received invalid metadata error in produce request on partition events-6 due to org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received.. Going to request metadata update now [Producer clientId=producer-1] Got error produce response with correlation id 3094 on topic-partition events-6, retrying (2 attempts left). Error: NETWORK_EXCEPTION
There is also a corresponding error on the broker (within a few ms):
Attempting to send response via channel for which there is no open connection, connection id XXX (kafka.network.Processor)
This was somewhat unexpected and sent us for a hunt across the infrastructure for possible connection issues, but we've found none.
Side note: In some cases the retries worked and the messages were successfully produced.
Only after many hours of heavy debugging, we've noticed, that the error might be related to the low timeout setting. We've removed that setting now, as it was a remnant from the past and no longer valid for our use-case. However in order to avoid other people having that issue again and to simplify future debugging, some form of timeout exception should be thrown.
Attachments
Issue Links
- is duplicated by
-
KAFKA-14317 ProduceRequest timeouts are logged as network exceptions
- Resolved