Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-1524

getDAGStatus seems to fork out the entire JVM on non-secure clusters

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.5.0
    • 0.5.1
    • None
    • None

    Description

      Tracked down a consistent fork() call to

      	at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
      	at org.apache.hadoop.util.Shell.run(Shell.java:418)
      	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
      	at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
      	at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
      	at org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(ShellBasedUnixGroupsMapping.java:83)
      	at org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBasedUnixGroupsMapping.java:52)
      	at org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.getGroups(JniBasedUnixGroupsMappingWithFallback.java:50)
      	at org.apache.hadoop.security.Groups.getGroups(Groups.java:139)
      	at org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInformation.java:1409)
      	at org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getRPCUserGroups(DAGClientAMProtocolBlockingPBServerImpl.java:75)
      	at org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getDAGStatus(DAGClientAMProtocolBlockingPBServerImpl.java:102)
      	at org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7375)
      

      hitesh - would it make sense to cache this at all?

      Attachments

        1. TEZ-1524.1.patch
          8 kB
          Gopal Vijayaraghavan
        2. TEZ-1524.2.patch
          38 kB
          Gopal Vijayaraghavan
        3. TEZ-1524.3.patch
          38 kB
          Gopal Vijayaraghavan

        Activity

          commit edb841c08de123ff2c5ace0662ae78bf3c58f2c0
          Author: Gopal V <gopalv@apache.org>
          Date: Fri Sep 12 15:04:32 2014 -0700

          TEZ-1524. Resolve user group information only if ACLs are enabled (gopalv)

          Thanks hitesh!

          gopalv Gopal Vijayaraghavan added a comment - commit edb841c08de123ff2c5ace0662ae78bf3c58f2c0 Author: Gopal V <gopalv@apache.org> Date: Fri Sep 12 15:04:32 2014 -0700 TEZ-1524 . Resolve user group information only if ACLs are enabled (gopalv) Thanks hitesh !

          Removed the stray println.

          gopalv Gopal Vijayaraghavan added a comment - Removed the stray println.
          hitesh Hitesh Shah added a comment -

          Looks good apart from:

          +      System.err.println(set + " -- " + userGroups);
          

          Feel free to commit after removing the above.

          hitesh Hitesh Shah added a comment - Looks good apart from: + System .err.println(set + " -- " + userGroups); Feel free to commit after removing the above.
          hitesh Hitesh Shah added a comment -

          Comment on patch:

          +    ACLManager aclManager = real.getACLManager();
          +    if (aclManager.isEnabled()) {
          +      String user = getRPCUserName();
          +      if (!real.getACLManager().checkAMViewAccess(user, getRPCUserGroups())) {
          +        throw new AccessControlException("User " + user
          +            + " cannot perform AM view operation");
          +      }
          

          Instead of above, it might be better to pass the UGI into the acl manager function call i.e. real.getACLManager().checkAMViewAccess(currentUserUGI) .

          hitesh Hitesh Shah added a comment - Comment on patch: + ACLManager aclManager = real.getACLManager(); + if (aclManager.isEnabled()) { + String user = getRPCUserName(); + if (!real.getACLManager().checkAMViewAccess(user, getRPCUserGroups())) { + throw new AccessControlException( "User " + user + + " cannot perform AM view operation" ); + } Instead of above, it might be better to pass the UGI into the acl manager function call i.e. real.getACLManager().checkAMViewAccess(currentUserUGI) .
          hitesh Hitesh Shah added a comment -

          Is this in a non-secure cluster with acls disabled? We might be able to work around this to invoking getGroups only when needed. It wont prevent calls for non-existent users but should help a bit.

          hitesh Hitesh Shah added a comment - Is this in a non-secure cluster with acls disabled? We might be able to work around this to invoking getGroups only when needed. It wont prevent calls for non-existent users but should help a bit.
          2014-08-29 09:27:12,602 WARN [IPC Server handler 0 on 59734] org.apache.hadoop.security.ShellBasedUnixGroupsMapping: got exception trying to get groups for user foo
          org.apache.hadoop.util.Shell$ExitCodeException: id: foo: No such user
          ...
          2014-08-29 09:27:12,602 WARN [IPC Server handler 0 on 59734] org.apache.hadoop.security.UserGroupInformation: No groups available for user
          

          I will submit a patch.

          gopalv Gopal Vijayaraghavan added a comment - 2014-08-29 09:27:12,602 WARN [IPC Server handler 0 on 59734] org.apache.hadoop.security.ShellBasedUnixGroupsMapping: got exception trying to get groups for user foo org.apache.hadoop.util.Shell$ExitCodeException: id: foo: No such user ... 2014-08-29 09:27:12,602 WARN [IPC Server handler 0 on 59734] org.apache.hadoop.security.UserGroupInformation: No groups available for user I will submit a patch.

          The cache does not cache misses.

          gopalv Gopal Vijayaraghavan added a comment - The cache does not cache misses.
          hitesh Hitesh Shah added a comment -

          gopalv Could you add more details as this points to a bug in UGI if it is indeed forking on each ask for groupNames.

          hitesh Hitesh Shah added a comment - gopalv Could you add more details as this points to a bug in UGI if it is indeed forking on each ask for groupNames.
          hitesh Hitesh Shah added a comment -

          Nevermind, ran a simple job on a single node cluster. The AM logs show "fetched groups" logged once followed by "cached groups".

          hitesh Hitesh Shah added a comment - Nevermind, ran a simple job on a single node cluster. The AM logs show "fetched groups" logged once followed by "cached groups".
          hitesh Hitesh Shah added a comment -

          UGI should be using the Groups class that does caching internally.

          Can you look for these logs in debug mode ( looking at 2.6.0-SNAPSHOT code though ):

          • "Returning cached groups for" ( implies obtained from cache )
          • "Returning fetched groups"
          hitesh Hitesh Shah added a comment - UGI should be using the Groups class that does caching internally. Can you look for these logs in debug mode ( looking at 2.6.0-SNAPSHOT code though ): "Returning cached groups for" ( implies obtained from cache ) "Returning fetched groups"

          People

            gopalv Gopal Vijayaraghavan
            gopalv Gopal Vijayaraghavan
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: