Details
Description
To debug/monitor production clusters, there are some more metrics I wish I had available.
In particular:
- Although the average FS latencies are useful, a 'histogram' of recent latencies (90% of reads completed in under 100ms, 99% in under 200ms, etc) would be more useful
- Similar histograms of latencies on common operations (GET, PUT, DELETE) would be useful
- Counting the number of accesses to each region to detect hotspotting
- Exposing the current number of HLog files