Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
SAMZA-905 added the capability to write the OFFSET file on every commit().
Unfortunately, the performance was a hindrance for one of our larger jobs at LinkedIn. The job has 10 stores, each with hundreds of partitions in their changelog topics. The performance problem came from KafkaSystemAdmin.getSystemStreamMetadata() method which:
1. Periodically refetches the topic metadata
2. Always fetches offsets twice (oldest,upcoming) for every partition
Calling this method to fetch the offsets for just a couple tasks is wasteful. Metadata should only be fetched if there's a problem. Doing it periodically doesn't help. The total number of offset fetches is S*2*T^2 where S is the number of stores and P is the number of tasks/changelog partitions. Since we only need the newest offset should require S*T offset requests. Ideally, we'd also parallelize these requests, but that will be an exercise for another time.
The fix has 3 components:
1. Cache metadata more aggressively. Only expire metadata if we get Kafka NotLeaderForPartitionException
2. Reduce excessive Offset fetching.
3. Do not allow unbounded exponential backoff for offset checkpointing, just skip the offset file. Exponential backoff can balloon the commit time and stall the event loop. So we will only retry up to 3 times for a max delay of 400ms