Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
6.6.5, 7.5
-
None
Description
Under certain circumstances, replication fails between a leader and follower. The follower will not receive updates from the leader, even though the leader has a newer version. If the leader is restarted, it will get the older version from the follower.
This was discussed on the mailing list and risdenk wrote a script that demonstrates this error. He also verified that the error occurs when the script is run outside of docker.
Here is the scenario of the failure:
- A collection with 1 shards and 2 replicas
- Stop non-leader replica (B)
- Index more than 100 documents to the collection
- Start replica B, it failed to do PeerSync and starts segments replication
- Index document 101th to the collection
- Leader's tlog: [1, 2, 3, ..., 100, 101]
- Replica's tlog: [101]
- Stop replica B
- Index document 102th to the collection
- Start replica B, on doing PeerSync
- Leader's tlog: [1, 2, 3, ..., 100, 101, 102]
- Replica's tlog: [101]
- Leader's high (80th): 80
- Replica's low: 101
- By comparison: replica's low > leader's high => ALREADY_IN_SYNC