Community Articles

Find and share helpful community-sourced technical articles.
Labels (2)
avatar

SYMPTOMS: During HDP upgrades, namenode restart would fail leading to upgrade failure. Following errors are usually seen:-

Traceback (most recent call last):File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py", line 42, in get_value_from_jmxreturn data_dict["beans"][0][property]
IndexError: list index out of range
Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 420, in <module>NameNode().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 280, in executemethod(env)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 720, in restartself.start(env, upgrade_type=upgrade_type)
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 101, in startupgrade_suspended=params.upgrade_suspended, env=env)
File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunkreturn fn(*args, **kwargs)
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py", line 184, in namenodeif is_this_namenode_active() is False:
File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/decorator.py", line 55, in wrapperreturn function(*args, **kwargs)
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py", line 554, in is_this_namenode_active
raise Fail(format("The NameNode {namenode_id} is not listed as Active or Standby, waiting..."))
resource_management.core.exceptions.Fail: The NameNode nn1 is not listed as Active or Standby, waiting...

ROOT CAUSE: Starting from Ambari 2.4, when the cluster is large, HDP upgrade fails during namenode restart.This is because, restart command waits for namenode to come out of safemode and if the cluster size is large, namenode takes more time to leave safemode but Ambari marks this action as failure as the namenode didn't leave safemode within the configured timeout in Ambari scripts.The issue has been reported in AMBARI-18786

SOLUTION: Upgrade to Ambari 2.5

WORKAROUND: Increase the timeout for ambari as follows:-

1. Increase the timeout in

/var/lib/ambari-server/resources/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py

From this:

@retry(times=5, sleep_time=5, backoff_factor=2, err_class=Fail)

To this:

@retry(times=25, sleep_time=25, backoff_factor=2, err_class=Fail)

2. Restart Ambari server

4,325 Views
Comments
avatar
Contributor
hi @Gaurav 
I have similar issue on amabri 2.6.1 post upgrade.

My python version is still 2.6

can you help me please?
2018-05-11 17:06:21,660 - Getting jmx metrics from NN failed. URL: http://x.y.z:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py", line 42, in get_value_from_jmx
    return data_dict["beans"][0][property]
IndexError: list index out of range
Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 361, in <module>
    NameNode().execute()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 375, in execute
    method(env)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 978, in restart
    self.start(env, upgrade_type=upgrade_type)
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 99, in start
    upgrade_suspended=params.upgrade_suspended, env=env)
  File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk
    return fn(*args, **kwargs)
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py", line 203, in namenode
    if is_this_namenode_active() is False:
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/decorator.py", line 62, in wrapper
    return function(*args, **kwargs)
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py", line 611, in is_this_namenode_active
    raise Fail(format("The NameNode {namenode_id} is not listed as Active or Standby, waiting..."))
resource_management.core.exceptions.Fail: The NameNode nn1 is not listed as Active or Standby, waiting...
avatar
Contributor

resolved. For me it was problem with one of the JN