Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
Currently, when a cluster node starts and discovers that it wasn't properly shutdown, it first runs the complete LastRevRecovery and only continues startup when done.
However, when it fails to acquire the recovery lock, which implies that a different cluster node is already running the recovery on its behalf, it simply skips recovery and continues startup?
So what is it? Is running the recovery before proceeding critical or not? If it is, this code in LastRevRecoveryAgent needs to change:
//TODO What if recovery is being performed for current clusterNode by some other node //should we halt the startup if(!lockAcquired){ log.info("Last revision recovery already being performed by some other node. " + "Would not attempt recovery"); return 0; }
If it's not critical, we may want to run the recovery always asynchronously.
cc mreutegg and chetanm