Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.11.0.0
-
None
-
None
Description
Details of the problem are provided here: https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Exactly+Once+-+Solving+the+problem+of+spurious+OutOfOrderSequence+errors
A quick summary follows:
In the discussion of KIP-185: Make exactly once in order delivery per partition the default producer setting, the following point regarding the OutOfOrderSequenceException was raised:
1. The OutOfOrderSequenceException indicates that there has been data loss on the broker.. ie. a previously acknowledged message no longer exists. For most part, this should only occur in rare situations (simultaneous power outages, multiple disk losses, software bugs resulting in data corruption, etc.).
2. However, there is another perfectly normal scenario where data is removed: in particular, data could be deleted because it is old and crosses the retention threshold.
Hence, if a producer remains inactive for longer than a topic's retention period, we could get an OutOfOrderSequence which is a false positive: the data is removed through valid processes, and this isn't an error.
3. We would like to eliminate the possibility of getting spurious OutOfOrderSequenceExceptions – when you get it, it should always mean data loss and should be taken very seriously.