Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.9.0
-
None
-
None
Description
It would be useful to have metrics that measured the lag time between leader WAL writes and follower WAL writes. Imagine if a node on a cluster had a very slow disk or was extremely overloaded. That node may constantly be falling behind and/or remote bootstrapping. It would help to be able to monitor for nodes that were constantly very far behind the leader (high seconds or minutes) so that administrators could take a look at these slow machines and either remove them from the cluster or fix the underlying issues.