Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
There's an existing – but seemingly unused – implementation of percentile metrics that we attempted to use for end-to-end latency metrics in Streams. Unfortunately a number of limitations became apparent, and we ultimately pulled the metrics from the 2.6 release pending further investigation/improvement.
The problems we encountered were
- Need to set a static upper/lower limit for the values
- Not well suited to a distribution with a long tail, ie setting the max value too high caused the accuracy to plummet
- Required a lot of memory per metric for reasonable accuracy and caused us to hit OOM (unclear if there was actually a memory leak, or it was just gobbling up unnecessarily large amounts in general)
Since the Percentiles class is part of the public API, we may need to create a new class altogether and possibly deprecate/remove the old one. Alternatively we can consider just re-implementing the existing class from scratch, and just deprecating the current constructors and associated implementation (eg the constructor accepts a max)
Attachments
Issue Links
- is related to
-
KAFKA-10165 Percentiles metric leaking memory
- Resolved