Created on 12-24-2016 03:05 PM
SYMPTOMS: During HDP upgrades, namenode restart would fail leading to upgrade failure. Following errors are usually seen:-
Traceback (most recent call last):File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py", line 42, in get_value_from_jmxreturn data_dict["beans"][0][property] IndexError: list index out of range Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 420, in <module>NameNode().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 280, in executemethod(env) File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 720, in restartself.start(env, upgrade_type=upgrade_type) File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 101, in startupgrade_suspended=params.upgrade_suspended, env=env) File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunkreturn fn(*args, **kwargs) File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py", line 184, in namenodeif is_this_namenode_active() is False: File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/decorator.py", line 55, in wrapperreturn function(*args, **kwargs) File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py", line 554, in is_this_namenode_active raise Fail(format("The NameNode {namenode_id} is not listed as Active or Standby, waiting...")) resource_management.core.exceptions.Fail: The NameNode nn1 is not listed as Active or Standby, waiting...
ROOT CAUSE: Starting from Ambari 2.4, when the cluster is large, HDP upgrade fails during namenode restart.This is because, restart command waits for namenode to come out of safemode and if the cluster size is large, namenode takes more time to leave safemode but Ambari marks this action as failure as the namenode didn't leave safemode within the configured timeout in Ambari scripts.The issue has been reported in AMBARI-18786
SOLUTION: Upgrade to Ambari 2.5
WORKAROUND: Increase the timeout for ambari as follows:-
1. Increase the timeout in
/var/lib/ambari-server/resources/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py
From this:
@retry(times=5, sleep_time=5, backoff_factor=2, err_class=Fail)
To this:
@retry(times=25, sleep_time=25, backoff_factor=2, err_class=Fail)
2. Restart Ambari server
Created on 05-11-2018 09:07 AM
hi @Gaurav I have similar issue on amabri 2.6.1 post upgrade. My python version is still 2.6 can you help me please? 2018-05-11 17:06:21,660 - Getting jmx metrics from NN failed. URL: http://x.y.z:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py", line 42, in get_value_from_jmx return data_dict["beans"][0][property] IndexError: list index out of range Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 361, in <module> NameNode().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 375, in execute method(env) File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 978, in restart self.start(env, upgrade_type=upgrade_type) File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 99, in start upgrade_suspended=params.upgrade_suspended, env=env) File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk return fn(*args, **kwargs) File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py", line 203, in namenode if is_this_namenode_active() is False: File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/decorator.py", line 62, in wrapper return function(*args, **kwargs) File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py", line 611, in is_this_namenode_active raise Fail(format("The NameNode {namenode_id} is not listed as Active or Standby, waiting...")) resource_management.core.exceptions.Fail: The NameNode nn1 is not listed as Active or Standby, waiting...
Created on 05-13-2018 06:46 PM
resolved. For me it was problem with one of the JN