Details
Description
The master might get a duplicate authenticate() request while a previous authentication attempt is in progress. Depending on what the AuthenticatorProcess is executing at the time, there are 2 possible race conditions which will cause scheduler/slave to continuously retry authentication but never succeed.
We have seen both the race conditions in a heavily loaded production cluster.
Race1:
----------
--> An authenticate() event was dispatched to AuthenticatorProcess (Master::authenticate() called Authenticator::authenticate())
--> A terminate() event was then injected into the front of the AuthenticatorProcess queue (duplicate Master::authenticate() did ~Authenticator) before the above authenticate() event was executed.
--> Due to the bug in libprocess, the future returned by Master::authenticate() was never transitioned to discarded (Master::_authenticate() was never called).
--> This caused all the subsequent authentication retries to be enqueued on the master waiting for Master::_authenticate() to be executed.
Fix: Transition the dispatched future to discarded if the libprocess is terminated (https://reviews.apache.org/r/25945/)
Race 2:
-----------
--> An authenticate() event was dispatched to AuthenticatorProcess (Master::authenticate() called Authenticator::authenticate())
--> AuthenticatorProcess::authenticate() executed and set promise.onDiscard(defer(self, Self::discarded)). NOTE: The internal promise of AuthenticatorProcess is discarded in AuthenticatorProcess::discarded()
--> A terminate() event was then injected into the front of the AuthenticatorProcess queue (duplicate Master::authenticate() did
~Authenticator) before the above discarded() event was executed)
--> ~AuthenticatorProcess is destructed without ever discarding the internal promise (Master::_authenticate() was never called).
--> This caused all the subsequent authentication retries to be enqueued on the master waiting for Master::_authenticate() to be executed.
Fix: The fix here is to discard the internal promise when the AuthenticatorProcess is destructed.
Attachments
Issue Links
- is related to
-
MESOS-2307 Dispatching to a non-existent Process should not return a pending future.
- Accepted