Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.2.0
-
None
Description
As reported by a user on Slack, ksck can give you a wrong output such as:
ca199fafca544df2a1b2a01be9d5266d (server1:7250): RUNNING [LEADER] a077957f627c4758ab5a989aca8a1ca8 (server2:7250): RUNNING 5c09a555c205482b8131f15b2c249ec6 (server3:7250): bad state State: NOT_STARTED Data state: TABLET_DATA_TOMBSTONED Last status: Tablet initializing...
The problem is that server2 was already evicted out of the configuration (based on reading the logs) but it wasn't committed in the config (which contains server 1 and 3) since there's really only 1 server left out of 3.
Ideally ksck should try to see what each server thinks the configuration is and see if there's a difference from what's in the master. As it is, it looks like we're missing 1 replica but in reality this is a broken tablet.