Details
Description
There seems to have been a regression in the way the offset-commit-* metrics are calculated for source Kafka Connect connectors since version 2.8.0.
Before this version, any timeout or interruption while trying to commit offsets for source connectors (e.g. MM2 MirrorSourceConnector) would get correctly flagged as an offset commit failure (i.e the offset-commit-failure-percentage metric ** would be non-zero). Since version 2.8.0, these errors are considered as successes.
After digging through the code, the commit where this bug was introduced appears to be this one : https://github.com/apache/kafka/commit/047ad654da7903f3903760b0e6a6a58648ca7715
I believe removing the boolean success argument in the recordCommit method of the WorkerTask class (argument deemed redundant because of the presence of the Throwable error argument) and only considering the presence of a non-null error to determine if a commit is a success or failure might be a mistake. This is because in the commitOffsets method of the WorkerSourceTask class, there are multiple cases where an exception object is either not available or is not passed to the recordCommitFailure method, e.g. :
- TImeout #1 : https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L519
- Timeout #2 : https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L584
- Interruption : https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L529
- Unserializable offset : https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L562