- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on 11-17-2016 02:39 PM - edited 08-17-2019 08:06 AM
Problem Statement: Downgrading HDP which failed on restarting Namenode service and struck on below error -
resource_management.core.exceptions.Fail: The NameNode None is not listed as Active or Standby, waiting...
ERROR:
=== Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 420, in <module> NameNode().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 280, in execute method(env) File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 720, in restart self.start(env, upgrade_type=upgrade_type) File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 101, in start upgrade_suspended=params.upgrade_suspended, env=env) File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk return fn(*args, **kwargs) File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py", line 185, in namenode if is_this_namenode_active() is False: File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/decorator.py", line 55, in wrapper return function(*args, **kwargs) File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py", line 555, in is_this_namenode_active raise Fail(format("The NameNode {namenode_id}is not listed as Active or Standby, waiting...")) resource_management.core.exceptions.Fail: The NameNode None is not listed as Active or Standby, waiting... ===
Root Cause: Restarting namenode was not able to populate the namenode_id which is required to detect the status of namenode as active / standby
Resolution:
1.The dfs.namenode.rpc-address.<cluster-name>.<nn-id> was set to an IP address instead of host name and hence the namenode_id was set to None. We see if we can also check for ip address apart for hostname to retrieve the namenode_id. Below is from the code -
# Values for the current Host namenode_id = None namenode_rpc = None dfs_ha_namemodes_ids_list = [] other_namenode_id = None if dfs_ha_namenode_ids: dfs_ha_namemodes_ids_list = dfs_ha_namenode_ids.split(",") dfs_ha_namenode_ids_array_len = len(dfs_ha_namemodes_ids_list) if dfs_ha_namenode_ids_array_len > 1: dfs_ha_enabled = True if dfs_ha_enabled: for nn_id in dfs_ha_namemodes_ids_list: nn_host = config['configurations']['hdfs-site'][format('dfs.namenode.rpc-address.{dfs_ha_nameservices}.{nn_id}')] if hostname in nn_host: namenode_id = nn_id namenode_rpc = nn_host
2. And tried continious shuffling namenodes failover using “hdfs haadmin -failover” and which worked to resolve the HDFS issue and the upgrade proceeded further
[You need to shuffle namenode using "hdfs haadmin -failover" from nn1 to nn2 and vice versa till ambari restart process is ongoing to detect status of namenode]