Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (2)
avatar
Super Guru

Problem Statement: Downgrading HDP which failed on restarting Namenode service and struck on below error -

resource_management.core.exceptions.Fail: The NameNode None is not listed as Active or Standby, waiting...

ERROR:

===
Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 420, in <module>
NameNode().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 280, in execute
method(env)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 720, in restart
self.start(env, upgrade_type=upgrade_type)
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 101, in start
upgrade_suspended=params.upgrade_suspended, env=env)
File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk
return fn(*args, **kwargs)
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py", line 185, in namenode
if is_this_namenode_active() is False:
File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/decorator.py", line 55, in wrapper
return function(*args, **kwargs)
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py", line 555, in is_this_namenode_active
raise Fail(format("The NameNode
{namenode_id}is not listed as Active or Standby, waiting..."))
resource_management.core.exceptions.Fail: The NameNode None is not listed as Active or Standby, waiting...
===

9520-downgrade-nn-error.png

Root Cause: Restarting namenode was not able to populate the namenode_id which is required to detect the status of namenode as active / standby

Resolution:

1.The dfs.namenode.rpc-address.<cluster-name>.<nn-id> was set to an IP address instead of host name and hence the namenode_id was set to None. We see if we can also check for ip address apart for hostname to retrieve the namenode_id. Below is from the code -

# Values for the current Host
namenode_id = None
namenode_rpc = None

dfs_ha_namemodes_ids_list = []
other_namenode_id = None

if dfs_ha_namenode_ids:
  dfs_ha_namemodes_ids_list = dfs_ha_namenode_ids.split(",")
  dfs_ha_namenode_ids_array_len = len(dfs_ha_namemodes_ids_list)
  if dfs_ha_namenode_ids_array_len > 1:
    dfs_ha_enabled = True
if dfs_ha_enabled:
  for nn_id in dfs_ha_namemodes_ids_list:
    nn_host = config['configurations']['hdfs-site'][format('dfs.namenode.rpc-address.{dfs_ha_nameservices}.{nn_id}')]
    if hostname in nn_host:
      namenode_id = nn_id
      namenode_rpc = nn_host

2. And tried continious shuffling namenodes failover using “hdfs haadmin -failover” and which worked to resolve the HDFS issue and the upgrade proceeded further

[You need to shuffle namenode using "hdfs haadmin -failover" from nn1 to nn2 and vice versa till ambari restart process is ongoing to detect status of namenode]

399 Views