Details
-
Improvement
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
Problem:
currently re-replicator only uses the view of available/readonly bookies right at the time doing bookie checking. it would accidentally treat a bookie disappeared from zookeeper (e.g. zookeeper session expired, bookie restarted, flapping bookie registration due to network/gc) as lost bookies, which introduce unnecessary re-replication.
Solution:
introduce 'auditorStaleBookieInterval', if a bookie never register in the given interval, it would be marked as 'stale' bookies and re-replicate all ledgers belongs to that bookie. the default value is set 30 minutes.
Fixes:
- refactor bookie watcher to allow notifying bookie list thru BookiesListener
- introduce 'auditorStaleBookieInterval' to be able to mark bookies as 'stale' if bookies aren't registered themselves to zookeeper
- add more info logging about critical steps on re-replication logic
- misc changes