Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
HDFS-11437 is going to take a non-trivial amount of work to do right. In the meantime it'd be nice to have a way to cancel pending connections (even when the FS claimed they are finished).
Proposed workaround is to relax the rules about when FileSystem::CancelPending connect can be called since it isn't able to properly determine when it's connected anyway. In order to determine when the FS has connected you can do some simple RPC call since that will wait on failover. If CancelPending can be called during that first RPC call then it will effectively be canceling FileSystem::Connect
Current cancel rules - asterisk on steps where CancelPending is allowed
FileSystem::Connect called
FileSystem communicates with first NN *
FileSystem::Connect returns - even if it hasn't communicated with the active NN
Proposed relaxation
FileSystem::Connect called
FileSystem communicates with first NN*
FileSystem::Connect returns *
FileSystem::GetFileInfo called * -any namenode RPC call will do, ignore perm errors
RPC engine blocks until it hits the active or runs out of retries *
FileSystem::GetFileInfo returns
It'd be up to the user to add in the dummy NN RPC call. Once HDFS-11437 is fixed this workaround can be removed.