Details
-
Improvement
-
Status: Patch Available
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
Description
We have observed one issue from YARN client around this piece of code:
While
services.add(SecurityUtil.buildTokenService( yarnConf.getSocketAddr(address, defaultAddr, defaultPort)) .toString());
is being called, buildTokenService() fails and will throw runtime exception, more specifically, UnknownHostException from here: https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L466
while one of the RM host was having networking issue that IP cannot be resolved.
This runtime exception then floats all the way up to our application and causes MR job submission failed.
In my opinion, since we have HA here, multiple RMs are still alive and available. We should catch this exception in getTokenService() and handle it properly, instead of failing the whole action.
Would like to hear your opinion on this, if agreed, I will provide a patch on this. Thank you.