Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.3.1
-
None
-
None
Description
The HA is enabled, and we have two NameNodes: nn1 and nn2.
When starting the cluster, the nn1 fails at the very beginning, and nn2 transfers to active state. The culster can provide services normally.
However, when we tried to get safe mode or wait exit safe mode, our dfsadmin command fails due to an IOException: cannot connect to nn1.
The root cause seems locate in here:
//DFSAdmin.class public void setSafeMode(String[] argv, int idx) throws IOException { … if (isHaEnabled) { String nsId = dfsUri.getHost(); List<ProxyAndInfo<ClientProtocol>> proxies = HAUtil.getProxiesForAllNameNodesInNameservice( dfsConf, nsId, ClientProtocol.class); for (ProxyAndInfo<ClientProtocol> proxy : proxies) { ClientProtocol haNn = proxy.getProxy(); //The code always queries from the first nn, i.e., nn1, and returns with IOException when nn1 fails. boolean inSafeMode = haNn.setSafeMode(action, false); if (waitExitSafe) { inSafeMode = waitExitSafeMode(haNn, inSafeMode); } System.out.println("Safe mode is " + (inSafeMode ? "ON" : "OFF") + " in " + proxy.getAddress()); } } … }
Actually, I'm curious that do we need to get/wait every namenode here when HA is enabled?