Uploaded image for project: 'Bookkeeper'
  1. Bookkeeper
  2. BOOKKEEPER-960

Re-replicator: bookie checking should handle flapping bookie registration

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • None
    • None
    • bookkeeper-server

    Description

      Problem:

      currently re-replicator only uses the view of available/readonly bookies right at the time doing bookie checking. it would accidentally treat a bookie disappeared from zookeeper (e.g. zookeeper session expired, bookie restarted, flapping bookie registration due to network/gc) as lost bookies, which introduce unnecessary re-replication.

      Solution:

      introduce 'auditorStaleBookieInterval', if a bookie never register in the given interval, it would be marked as 'stale' bookies and re-replicate all ledgers belongs to that bookie. the default value is set 30 minutes.

      Fixes:

      • refactor bookie watcher to allow notifying bookie list thru BookiesListener
      • introduce 'auditorStaleBookieInterval' to be able to mark bookies as 'stale' if bookies aren't registered themselves to zookeeper
      • add more info logging about critical steps on re-replication logic
      • misc changes

      Attachments

        Activity

          People

            yzang Yiming Zang
            yzang Yiming Zang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: