Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.23.3, 0.24.0
-
None
Description
In working at HADOOP-8077, HDFS-3084, and HDFS-3072, I ran into various difficulties which are an artifact of the current design. A few of these:
- the service name is "resolved" from the logical name (eg ns1.nn1) to an IP address at the outer layer of DFSHAAdmin
- this means it's difficult to provide the logical name "ns1.nn1" to fence scripts (
HDFS-3084) - this means it's difficult to configure fencing method per-namespace (since the FailoverController doesn't know what the namespace is) (
HADOOP-8077)
- this means it's difficult to provide the logical name "ns1.nn1" to fence scripts (
- the configuration for HA HDFS is weirdly split between core-site and hdfs-site, even though most users see this as an HDFS feature. For example, users expect to configure NN fencing configurations in hdfs-site, and expect the keys to have a dfs.* prefix
- proxies are constructed at the outer layer of the admin commands. This means it's impossible for the inner layers (eg FailoverController.failover) to re-construct proxies with different timeouts (
HDFS-3072)
The proposed refactor is to add a new interface (tentatively named HAServiceTarget) which refers to target for one of the admin commands. An instance of this class is responsible for creating proxies, creating fencers, mapping back to a logical name, etc. The HDFS implementation of this class can then provide different results based on the particular nameservice, can use HDFS-specific configuration prefixes, etc. Using this class as the argument for fencing methods also makes the API more evolvable in the future, since we can add new getters to HAServiceTarget (whereas the current InetSocketAddress is quite limiting)
Attachments
Attachments
Issue Links
- blocks
-
HADOOP-8204 TestHealthMonitor fails occasionally
- Closed
- is depended upon by
-
HADOOP-8077 HA: fencing method should be able to be configured on a per-NN or per-NS basis
- Closed
-
HADOOP-8236 haadmin should have configurable timeouts for failover commands
- Closed
-
HDFS-3084 FenceMethod.tryFence() and ShellCommandFencer should pass namenodeId as well as host:port
- Closed