Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
2.4.0
-
None
Description
Caused by: AMBARI-18240
In enable namenode HA wizard, failure happened at "Start Additional NameNode" step.
The first NameNode starts...
"href" : "https://172.22.115.113:8443/api/v1/clusters/cl1/requests/46/tasks/368", "Tasks" : { "attempt_cnt" : 1, "cluster_name" : "cl1", "command" : "START", "command_detail" : "NAMENODE START", "end_time" : 1472080011602, "error_log" : "/var/lib/ambari-agent/data/errors-368.txt", "exit_code" : 0, "host_name" : "nat-sp12-rnqs-amb-views-ha-6-5.openstacklocal", "id" : 368, "output_log" : "/var/lib/ambari-agent/data/output-368.txt", "request_id" : 46, "role" : "NAMENODE", "stage_id" : 0, "start_time" : 1472079963470, "status" : "COMPLETED", "stderr" : "2016-08-24 23:06:11,102 - Getting jmx metrics from NN failed. URL: http://nat-sp12-rnqs-amb-views-ha-6-5.openstacklocal:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem\nTraceback (most recent call last):\n File \"/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py\", line 42, in get_value_from_jmx\n return data_dict[\"beans\"][0][property]\nIndexError: list index out of range\n2016-08-24 23:06:14,332 - Getting jmx metrics from NN failed. URL: http://nat-sp12-rnqs-amb-views-ha-6-1.openstacklocal:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem\nTraceback (most recent call last):\n File \"/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py\", line 38, in get_value_from_jmx\n _, data, _ = get_user_call_output(cmd, user=run_user, quiet=False)\n File \"/usr/lib/python2.6/site-packages/resource_management/libraries/functions/get_user_call_output.py\", line 61, in get_user_call_output\n raise Fail(err_msg)\nFail: Execution of 'curl --negotiate -u : -s 'http://nat-sp12-rnqs-amb-views-ha-6-1.openstacklocal:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem' 1>/tmp/tmprdewEy 2>/tmp/tmpAmLket' returned 7. \n\n2016-08-24 23:06:22,280 - Getting jmx metrics from NN failed. URL: http://nat-sp12-rnqs-amb-views-ha-6-1.openstacklocal:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem\nTraceback (most recent call last):\n File \"/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py\", line 38, in get_value_from_jmx\n _, data, _ = get_user_call_output(cmd, user=run_user, quiet=False)\n File \"/usr/lib/python2.6/site-packages/resource_management/libraries/functions/get_user_call_output.py\", line 61, in get_user_call_output\n raise Fail(err_msg)\nFail: Execution of 'curl --negotiate -u : -s 'http://nat-sp12-rnqs-amb-views-ha-6-1.openstacklocal:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem' 1>/tmp/tmpHKH50b 2>/tmp/tmp6yyuWH' returned 7. \n\n2016-08-24 23:06:30,637 - Getting jmx metrics from NN failed. URL: http://nat-sp12-rnqs-amb-views-ha-6-1.openstacklocal:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem\nTraceback (most recent call last):\n File \"/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py\", line 38, in get_value_from_jmx\n _, data, _ = get_user_call_output(cmd, user=run_user, quiet=False)\n File \"/usr/lib/python2.6/site-packages/resource_management/libraries/functions/get_user_call_output.py\", line 61, in get_user_call_output\n raise Fail(err_msg)\nFail: Execution of 'curl --negotiate -u : -s 'http://nat-sp12-rnqs-amb-views-ha-6-1.openstacklocal:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem' 1>/tmp/tmpCXMjfH 2>/tmp/tmpq103ei' returned 7. \n\n2016-08-24 23:06:39,495 - Getting jmx metrics from NN failed. URL: http://nat-sp12-rnqs-amb-views-ha-6-1.openstacklocal:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem\nTraceback (most recent call last):\n File \"/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py\", line 38, in get_value_from_jmx\n _, data, _ = get_user_call_output(cmd, user=run_user, quiet=False)\n File \"/usr/lib/python2.6/site-packages/resource_management/libraries/functions/get_user_call_output.py\", line 61, in get_user_call_output\n raise Fail(err_msg)\nFail: Execution of 'curl --negotiate -u : -s 'http://nat-sp12-rnqs-amb-views-ha-6-1.openstacklocal:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem' 1>/tmp/tmpvdE9iJ 2>/tmp/tmpy9eAby' returned 7. \n\n2016-08-24 23:06:47,584 - Getting jmx metrics from NN failed. URL: http://nat-sp12-rnqs-amb-views-ha-6-1.openstacklocal:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem\nTraceback (most recent call last):\n File \"/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py\", line 38, in get_value_from_jmx\n _, data, _ = get_user_call_output(cmd, user=run_user, quiet=False)\n File \"/usr/lib/python2.6/site-packages/resource_management/libraries/functions/get_user_call_output.py\", line 61, in get_user_call_output\n raise Fail(err_msg)\nFail: Execution of 'curl --negotiate -u : -s 'http://nat-sp12-rnqs-amb-views-ha-6-1.openstacklocal:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem' 1>/tmp/tmp0Jx91E 2>/tmp/tmp6qu0gW' returned 7.",
The second does not:
{ "href" : "https://172.22.115.113:8443/api/v1/clusters/cl1/requests/47/tasks/369", "Tasks" : { "attempt_cnt" : 1, "cluster_name" : "cl1", "command" : "START", "command_detail" : "NAMENODE START", "end_time" : 1472080160611, "error_log" : "/var/lib/ambari-agent/data/errors-369.txt", "exit_code" : 1, "host_name" : "nat-sp12-rnqs-amb-views-ha-6-1.openstacklocal", "id" : 369, "output_log" : "/var/lib/ambari-agent/data/output-369.txt", "request_id" : 47, "role" : "NAMENODE", "stage_id" : 0, "start_time" : 1472080026015, "status" : "FAILED", "stderr" : "2016-08-24 23:07:13,642 - Getting jmx metrics from NN failed. URL: http://nat-sp12-rnqs-amb-views-ha-6-1.openstacklocal:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem\nTraceback (most recent call last):\n File \"/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py\", line 42, in get_value_from_jmx\n return data_dict[\"beans\"][0][property]\nIndexError: list index out of range\nTraceback (most recent call last):\n File \"/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py\", line 420, in <module>\n NameNode().execute()\n File \"/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py\", line 280, in execute\n method(env)\n File \"/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py\", line 101, in start\n upgrade_suspended=params.upgrade_suspended, env=env)\n File \"/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py\", line 89, in thunk\n return fn(*args, **kwargs)\n File \"/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py\", line 184, in namenode\n if is_this_namenode_active() is False:\n File \"/usr/lib/python2.6/site-packages/resource_management/libraries/functions/decorator.py\", line 55, in wrapper\n return function(*args, **kwargs)\n File \"/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py\", line 549, in is_this_namenode_active\n raise Fail(format(\"The NameNode {namenode_id} is not listed as Active or Standby, waiting...\"))\nresource_management.core.exceptions.Fail: The NameNode nn2 is not listed as Active or Standby, waiting...",
When the UI enables NN HA first starts NN1 than NN2. At this stage both NNs are in 'standby' mode. The active node will be elected only later ( I believe when ZKFC is installed and started) thus I think the second NN start shouldn't be failed if no active name node was found:
1st NN start:
nat-sp12-rnqs-amb-views-ha-7-5.openstacklocal
2016-08-24 23:08:20,037 - NameNode HA states: active_namenodes = [], standby_namenodes = [(u'nn1', 'nat-sp12-rnqs-amb-views-ha-7-5.openstacklocal:50070')], unknown_namenodes = [(u'nn2', 'nat-sp12-rnqs-amb-views-ha-7-3.openstacklocal:50070')] 2016-08-24 23:08:20,037 - No active NameNode was found after 5 retries. Will return current NameNode HA states 2016-08-24 23:08:20,037 - Skipping Safemode check due to the following conditions: HA: True, isActive: False, upgradeType: None 2016-08-24 23:08:20,037 - Skipping creation of HDFS directories since this is either not the Active NameNode or we did not wait for Safemode to finish. Command completed successfully!
2nd NN start:
nat-sp12-rnqs-amb-views-ha-7-3.openstacklocal
2016-08-24 23:10:51,011 - NameNode HA states: active_namenodes = [], standby_namenodes = [(u'nn1', 'nat-sp12-rnqs-amb-views-ha-7-5.openstacklocal:50070'), (u'nn2', 'nat-sp12-rnqs-amb-views-ha-7-3.openstacklocal:50070')], unknown_namenodes = [] 2016-08-24 23:10:51,012 - No active NameNode was found after 5 retries. Will return current NameNode HA states Command failed after 1 tries
Since the 2nd NN start failed the wizard does not continue with installing ZKFC and rest of the steps.
Attachments
Attachments
Issue Links
- links to