Description
We've hit several OOM in our soak cluster lately. We were finally able to get a heap dump right after the OOM, and found over 3.5 GB of memory being retained by the percentiles (or specifically by the 1MB float[] used by the percentiles).
The leak does seem specific to the Percentiles class, as we see ~3000 instances of the Percentiles object vs only ~500 instances of the Max object, which is also used in the same sensor as the Percentiles
We did recently lower the size from 1MB to 100kB, but it's clear there is a leak of some kind and a "smaller leak" is not an acceptable solution. If the cause fo the leak is not immediately obvious we should just revert the percentiles in 2.6 and work on stabilizing them for 2.7
Attachments
Issue Links
- relates to
-
KAFKA-10177 Replace/improve Percentiles metrics
- Open
- links to