Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
0.90.1
-
None
-
None
-
Reviewed
Description
ReplicationZookeeper is a bit sloppy in how it handles the znodes during failover:
- when creating the lock, it doesn't cleanly handle the situation where the parent znode might already be deleted.
- when deleting the znodes after a successful move, it doesn't make sure to delete the lock znode last.
- after deleting the lock, there's a window where another region server could have already created another lock and deleted the znodes which would abort the first region server (saw it on one cluster).