Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
2.0.0
-
Ambari 2.1.1 Build #107
HDP 2.3 GA
ZK + AMS + Kafka
SLES 11 SP3
MIT KDC, all single node
Register hosts / bootstrap agents via SSH
Description
When executing the Kerberos service check, the following error occurs:
stderr: /var/lib/ambari-agent/data/errors-24.txt Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/KERBEROS/1.10.3-10/package/scripts/service_check.py", line 81, in <module> KerberosServiceCheck().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 218, in execute method(env) File "/var/lib/ambari-agent/cache/common-services/KERBEROS/1.10.3-10/package/scripts/service_check.py", line 64, in service_check user=params.smoke_user File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 157, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 152, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 118, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 258, in action_run tries=self.resource.tries, try_sleep=self.resource.try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call tries=tries, try_sleep=try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in _call raise Fail(err_msg) resource_management.core.exceptions.Fail: Execution of '/usr/bin/kinit -c /var/lib/ambari-agent/data/tmp/kerberos_service_check_cc_dd529fe1e15538ddfe9ce0347604d64c -kt /etc/security/keytabs/kerberos.service_check.080315.keytab MyCluster-080315@EXAMPLE.COM' returned 1. kinit(v5): Credentials cache permissions incorrect when initializing cache /var/lib/ambari-agent/data/tmp/kerberos_service_check_cc_dd529fe1e15538ddfe9ce0347604d64c stdout: /var/lib/ambari-agent/data/output-24.txt Performing kinit using MyCluster-080315@EXAMPLE.COM 2015-08-03 19:11:57,085 - Execute['/usr/bin/kinit -c /var/lib/ambari-agent/data/tmp/kerberos_service_check_cc_dd529fe1e15538ddfe9ce0347604d64c -kt /etc/security/keytabs/kerberos.service_check.080315.keytab MyCluster-080315@EXAMPLE.COM'] {'user': 'jambari-qa'} 2015-08-03 19:11:57,179 - File['/var/lib/ambari-agent/data/tmp/kerberos_service_check_cc_dd529fe1e15538ddfe9ce0347604d64c'] {'action': ['delete']}
This error happens only on SLES, however the cause exists on all platforms. The other platforms silently ignore the condition; which, however, does not have any bearing on the results of the kinit test.
Cause
The "Credentials cache permissions incorrect when initializing cache" issue is caused by the inability to write the Kerberos ticket cache file to the specified location. In the case it is /var/lib/ambari-agent/data/tmp/kerberos_service_check_cc_dd529fe1e15538ddfe9ce0347604d64c. The reason for the write failure is that /var/lib/ambari-agent/data/tmp is not writable by the user executing the kinit call - which is the Ambari smoke test user (typically ambari-qa). The directory's permissions are
drwxr-xr-x. 4 root root 4096 Aug 3 22:20 /var/lib/ambari-agent/data/tmp/
Solution
In order for the ambari smoke test user to be able to write to the relevant directory (/var/lib/ambari-agent/data/tmp), the permissions must be set at least as follows
drwxrwxr-x. 4 root hadoop 4096 Aug 3 22:20 /var/lib/ambari-agent/data/tmp/
However, at the time this directory is created, it is not known what the name of the hadoop group is, so the next best solution is to set the permissions as
drwxrwxrwx. 4 root root 4096 Aug 3 22:20 /var/lib/ambari-agent/data/tmp/
If the ambari-agent is installed manually via the relevant package manager, the directory is created with the open permissions (777, drwxrwxrwx) via the packages install_helper.sh post install script. However if Ambari installs the agent via SSH, the directory is created with the more restrictive permissions (755, drwxr-xr-x) via the agent bootstrap.py script.
To make these consistent, the following needs to be changed
command = "sudo mkdir -p {0} ; sudo chown -R {1} {0} ; sudo chmod 755 {3} ; sudo chmod 755 {2} ; sudo chmod 755 {0}".format(
self.TEMP_FOLDER, quote_bash_args(params.user), DEFAULT_AGENT_DATA_FOLDER, DEFAULT_AGENT_LIB_FOLDER)
to
command = "sudo mkdir -p {0} ; sudo chown -R {1} {0} ; sudo chmod 755 {3} ; sudo chmod 755 {2} ; sudo chmod 777 {0}".format(
self.TEMP_FOLDER, quote_bash_args(params.user), DEFAULT_AGENT_DATA_FOLDER, DEFAULT_AGENT_LIB_FOLDER)
Note: self.TEMP_FOLDER contains the path to the Ambari agent temp folder (typically, /var/lib/ambari-agent/data/tmp).
Attachments
Attachments
Issue Links
- links to