[SOLR-12969] Inconsistency with leader when PeerSync return ALREADY_IN_SYNC - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 6.6.5, 7.5
Fix Version/s: 8.0
Component/s: replication (java)
Labels:
None

Description

Under certain circumstances, replication fails between a leader and follower. The follower will not receive updates from the leader, even though the leader has a newer version. If the leader is restarted, it will get the older version from the follower.

This was discussed on the mailing list and risdenk wrote a script that demonstrates this error. He also verified that the error occurs when the script is run outside of docker.

Here is the scenario of the failure:

A collection with 1 shards and 2 replicas
Stop non-leader replica (B)
Index more than 100 documents to the collection
Start replica B, it failed to do PeerSync and starts segments replication
Index document 101th to the collection
- Leader's tlog: [1, 2, 3, ..., 100, 101]
- Replica's tlog: [101]
Stop replica B
Index document 102th to the collection
Start replica B, on doing PeerSync
- Leader's tlog: [1, 2, 3, ..., 100, 101, 102]
- Replica's tlog: [101]
- Leader's high (80th): 80
- Replica's low: 101
- By comparison: replica's low > leader's high => ALREADY_IN_SYNC

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

SOLR-12969.patch
12/Nov/18 07:19
26 kB
Cao Manh Dat
SOLR-12969.patch
11/Nov/18 15:49
25 kB
Cao Manh Dat
SOLR-12969.patch
08/Nov/18 16:19
20 kB
Cao Manh Dat

Activity

People

Assignee:: Cao Manh Dat

Reporter:: Jeremy Smith

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 06/Nov/18 20:14

Updated:: 02/Oct/19 17:23

Resolved:: 16/Dec/18 17:02