Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Feature Request to add Grafana graph of last value (not average please) LastGcInfo duration for all 3 major garbage collectors :
- G1GC Young Gen
- G1GC Old Generations
- CMS
- ParallelNew
CMS and ParNew example taken from NameNode JMX metrics:
}, { "name" : "java.lang:type=GarbageCollector,name=ConcurrentMarkSweep", "modelerType" : "sun.management.GarbageCollectorImpl", "LastGcInfo" : { "GcThreadCount" : 11, "duration" : 5206, ... }, { "name" : "java.lang:type=GarbageCollector,name=ParNew", "modelerType" : "sun.management.GarbageCollectorImpl", "LastGcInfo" : { "GcThreadCount" : 11, "duration" : 6,
G1GC Young and Old Gen example taken from RegionServer JMX metrics:
}, { "name" : "java.lang:type=GarbageCollector,name=G1 Young Generation", "modelerType" : "sun.management.GarbageCollectorImpl", "LastGcInfo" : { "GcThreadCount" : 24, "duration" : 120,
}, { "name" : "java.lang:type=GarbageCollector,name=G1 Old Generation", "modelerType" : "sun.management.GarbageCollectorImpl", "LastGcInfo" : { "GcThreadCount" : 24, "duration" : 19641,
Yes this old gen GC is atrocious which is why I'm here to tune this, but it helps if this stuff is monitored properly in the first place to know there is a problem without waiting until there are random RegionServer deaths due to long GC pauses.
Right now Ambari's Grafana has GCTimeMillis which would make one think this is not a problem as it only shows an averaged out 40ms per sec of GC time which isn't very helpful to spotting this long GC pause problem.
Attachments
Issue Links
- is related to
-
AMBARI-24244 Grafana HBase GC Time graph wrong / misleading - hiding large GC pauses ~ 2 dozen secs!
- Open