Details
-
Sub-task
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
None
-
None
-
None
Description
Yet another case of stuck similar with HBASE-21364.
The case is that:
1. A ModifyProcedure spawned a ReopenTableProcedure, and since its holdLock=false, so it release the lock
2. The ReopenTableProcedure spawned several MoveRegionProcedure, it also has holdLock=false, but just after it store the children procedures to the wal and begin to release the lock, the master was killed.
3. When restarting, the ReopenTableProcedure's lock was restored (since it was hold the lock before, which is not right, since it is in WAITING state now and its holdLock=false)
4. After restart, MoveRegionProcedure can execute since its parent has the lock, but when it spawned the AssignProcedure, the AssignProcedure procedure can't execute anymore, since it parent didn't have the lock, but its 'grandpa' - ReopenTableProcedure has.
5. Restart the master, the stuck still, because we will restore the lock for ReopenTableProcedure.
Two fixes:
1. We should not restore the lock if the procedure doesn't hold lock and in WAITING state.
2. Procedures don't have lock but its parent has the lock should also be put in front of the queue, as a addendum of HBASE-21364.
Discussion:
Should we check the lock of all ancestors not only its parents? As addressed in the comments of the patch, currently, after fix the issue above, check parent is enough.
Attachments
Attachments
Issue Links
- is broken by
-
HBASE-20920 Merge the update procedure store on locking with the general persist after a procedure execution
- Open