Description
When time-based retention is configured, the timestamp provided by the producer is used by default to determine the retention period of the log. Customers have the option of changing the configuration to use the broker's timestamp by overriding the configuration for "log.message.timestamp.type", but by default, the producer's timestamp is used. The producer's record timestamp can be in the past or future. Kafka determines the retention time of the log by comparing the broker's time with the record's time.
Arguably, there can be use cases for a producer to send records with timestamps that are in the past (for example, for replaying old data), but it is inaccurate for records to have a timestamp that is far in the future compared to the broker's current time.
There is a configurable property called "message.timestamp.difference.max.ms" that customers can use to control the allowed time difference between the broker's current time and the record timestamp. However, the validation from the Kafka code side can be improved by rejecting records with future timestamps from being written in the first place.
Customers have run into this issue in the past where a producer is configured erroneously to set the record timestamp in nanoseconds instead of milliseconds, resulting in a record timestamp that is in the future, and the time-based retention policy did not kick in as expected.
The improvement I am proposing is to add basic validation in org.apache.kafka.storage.internals.log.LogValidator to reject record timestamps that are in the future compared to the broker current timestamp after accounting for a sensible tolerance for potential clock skew.
Attachments
Issue Links
- links to
- mentioned in
-
Page Loading...