Description
Currently, the ChangeSecret tool doesn't do any check to ensure the user running it has the ability to write to /accumlo/instance_id.
In the event that an admin knows the instance secret but runs the command as a user who can not write to the instance_id, the result is an unhelpful error message and a disconnect between HDFS and zookeeper.
Example for cluster with instance named "foobar"
[busbey@edge ~]$ hdfs dfs -ls /accumulo/instance_id Found 1 items -rw-r--r-- 3 accumulo accumulo 0 2014-07-02 09:05 /accumulo/instance_id/cb977c77-3e13-4522-b718-2b487d722fd4 [busbey@edge ~]$ accumulo org.apache.accumulo.server.util.ChangeSecret old zookeeper password: new zookeeper password: Thread "org.apache.accumulo.server.util.ChangeSecret" died Permission denied: user=busbey, access=WRITE, inode="/accumulo":accumulo:accumulo:drwxr-x--x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:224) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:204) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4846) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:2911) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:2872) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2859) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:642) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:408) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44968) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746) org.apache.hadoop.security.AccessControlException: Permission denied: user=busbey, access=WRITE, inode="/accumulo":accumulo:accumulo:drwxr-x--x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:224) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:204) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4846) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:2911) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:2872) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2859) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:642) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:408) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44968) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57) at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:1489) at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:355) at org.apache.accumulo.server.util.ChangeSecret.updateHdfs(ChangeSecret.java:150) at org.apache.accumulo.server.util.ChangeSecret.main(ChangeSecret.java:66) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.accumulo.start.Main$1.run(Main.java:141) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=busbey, access=WRITE, inode="/accumulo":accumulo:accumulo:drwxr-x--x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:224) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:204) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4846) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:2911) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:2872) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2859) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:642) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:408) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44968) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746) at org.apache.hadoop.ipc.Client.call(Client.java:1238) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at $Proxy16.delete(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:408) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at $Proxy17.delete(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:1487) ... 9 more [busbey@edge ~]$ hdfs dfs -ls /accumulo/instance_id Found 1 items -rw-r--r-- 3 accumulo accumulo 0 2014-07-02 09:05 /accumulo/instance_id/cb977c77-3e13-4522-b718-2b487d722fd4 [busbey@edge ~]$ zookeeper-client Connecting to localhost:2181 Welcome to ZooKeeper! JLine support is enabled WATCHER:: WatchedEvent state:SyncConnected type:None path:null [zk: localhost:2181(CONNECTED) 0] get /accumulo/instances/foobar 1528cc95-2600-4649-a50e-1645404e9d6c cZxid = 0xe00034f45 ctime = Wed Jul 02 09:27:58 PDT 2014 mZxid = 0xe00034f45 mtime = Wed Jul 02 09:27:58 PDT 2014 pZxid = 0xe00034f45 cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 36 numChildren = 0 [zk: localhost:2181(CONNECTED) 1] ls /accumulo/1528cc95-2600-4649-a50e-1645404e9d6c [users, monitor, problems, root_tablet, gc, hdfs_reservations, table_locks, namespaces, recovery, fate, tservers, tables, next_file, tracers, config, dead, bulk_failed_copyq, masters] [zk: localhost:2181(CONNECTED) 2] ls /accumulo/cb977c77-3e13-4522-b718-2b487d722fd4 [users, problems, monitor, root_tablet, hdfs_reservations, gc, table_locks, namespaces, recovery, fate, tservers, tables, next_file, tracers, config, masters, bulk_failed_copyq, dead]
What's worse, in this condition the cluster will properly come up and show everything fine if the old instance secret is used.
However, clients and servers will now end up looking at different zookeeper nodes depending on wether they used HDFS to get the instance_id or if they use a ZK instance name lookup to get it so long as they use the corresponding instance secret.
Furthermore, if an admin uses the CleanZooKeeper utility subsequent to this failure, it'll cause the loss of the zookeeper nodes the server processes are looking at.
The utility should do a sanity check that /accumulo/instance_id is writable prior to changing zookeeper. It should also wait to update the instance name to instand_id pointer in zookeeper until after HDFS has been updated.
Workaround: manually edit the HDFS instance_id to match the new instance id found zk for the instance name and proceed as though the secret change had succeeded.
Attachments
Issue Links
- is related to
-
ACCUMULO-4415 Tracer requires instance.secret
- Resolved