Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

NameNode failing after HA NameNode Rollback

Hi there,

I have a fresh installation of HDP 2.3.4 on a 5-node cluster. All of my services were running successfully, with statistics displayed in the widgets. I have not have any NameNode issues up til today.

Earlier today I started the "Enable NameNode HA" Wizard. It failed at the first step in the installation phase (I think it was the namenode) and retrying didn't work, but I wasn't able to move forward or back in the process so I left and followed https://docs.hortonworks.com/HDPDocuments/Ambari-2.1.2.1/bk_Ambari_Users_Guide/content/_how_to_roll_....

At the end of completing the entire guide (and I've now gone back and done the whole thing over in case I missed something), I started HDFS (step 1.2.13) and the operation failed for the NameNode. I have no idea what to do! Does anyone recognize this error?

Here is the output:

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 408, in <module>
    NameNode().execute()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute
    method(env)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 530, in restart
    self.start(env, upgrade_type=upgrade_type)
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 103, in start
    upgrade_suspended=params.upgrade_suspended, env=env)
  File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk
    return fn(*args, **kwargs)
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py", line 212, in namenode
    create_hdfs_directories(is_active_namenode_cmd)
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py", line 278, in create_hdfs_directories
    only_if=check
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
    provider_action()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 463, in action_create_on_execute
    self.action_delayed("create")
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 460, in action_delayed
    self.get_hdfs_resource_executor().action_delayed(action_name, self)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 246, in action_delayed
    main_resource.resource.security_enabled, main_resource.resource.logoutput)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 133, in __init__
    security_enabled, run_user)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/namenode_ha_utils.py", line 167, in get_property_for_active_namenode
    if INADDR_ANY in value and rpc_key in hdfs_site:
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/config_dictionary.py", line 81, in __getattr__
    raise Fail("Configuration parameter '" + self.name + "' was not found in configurations dictionary!")
resource_management.core.exceptions.Fail: Configuration parameter 'dfs.namenode.https-address' was not found in configurations dictionary!
1 ACCEPTED SOLUTION

Contributor
@Savanna Endicott

This time you are getting same error but for different property. Previously you got for 'dfs.namenode.https-address' and now you are getting for 'dfs.namenode.http-address'. Please repeat the same step again and this time use http property:

/var/lib/ambari-server/resources/scripts/configs.sh -u AMBARI_USER -p AMBARI_PASS set AMBARI_HOST_NAME CLUSTER_NAME hdfs-site dfs.namenode.http-address "abc.xyz.com:50070"

Remember the port is 50070 this time for http address.

View solution in original post

4 REPLIES 4

Contributor

@Savanna Endicott You can use below command to push the property to the cluster and then try to restart NN

/var/lib/ambari-server/resources/scripts/configs.sh -u AMBARI_USER -p AMBARI_PASS set AMBARI_HOST_NAME CLUSTER_NAME PROPERTY_FILE PROPERTY_NAME "VALUE"

In your case this would be :

/var/lib/ambari-server/resources/scripts/configs.sh -u AMBARI_USER -p AMBARI_PASS set AMBARI_HOST_NAME CLUSTER_NAME hdfs-site dfs.namenode.https-address "abc.xyz.com:50470"

Replace value according to your cluster specification.Where:

	AMBARI_USER - Your Ambari UI login user (Default admin)
	AMBARI_PASSWORD - Login user's password (Default admin)
	AMBARI_HOST_NAME - Your Amabri server host
	CLSUTER_NAME - Your cluster name  (Case Sensitive)
	abc.xyz.com - Namenode hostname.

Hi @lraheja, thanks for your response.

I ran the command you suggested, and the result is the same error. I had restarted all of my instances, and stopped all services again. Is there something else I could do in addition to this? It also prints out the following attempt ten times which is shown in the stdout for the operation, maybe that's the issue? I turned safemode off using the same command used to turn it off but with "leave".

2016-09-06 07:35:08,376 - Retrying after 10 seconds. Reason: Execution of '/usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin -fs hdfs://abc.xyz.com -safemode get | grep 'Safe mode is OFF'' returned 1. 

And here is the error again:

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 408, in <module>
    NameNode().execute()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute
    method(env)
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 103, in start
    upgrade_suspended=params.upgrade_suspended, env=env)
  File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk
    return fn(*args, **kwargs)
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py", line 212, in namenode
    create_hdfs_directories(is_active_namenode_cmd)
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py", line 278, in create_hdfs_directories
    only_if=check
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
    provider_action()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 463, in action_create_on_execute
    self.action_delayed("create")
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 460, in action_delayed
    self.get_hdfs_resource_executor().action_delayed(action_name, self)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 246, in action_delayed
    main_resource.resource.security_enabled, main_resource.resource.logoutput)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 135, in __init__
    security_enabled, run_user)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/namenode_ha_utils.py", line 167, in get_property_for_active_namenode
    if INADDR_ANY in value and rpc_key in hdfs_site:
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/config_dictionary.py", line 81, in __getattr__
    raise Fail("Configuration parameter '" + self.name + "' was not found in configurations dictionary!")
resource_management.core.exceptions.Fail: Configuration parameter 'dfs.namenode.http-address' was not found in configurations dictionary!

Contributor
@Savanna Endicott

This time you are getting same error but for different property. Previously you got for 'dfs.namenode.https-address' and now you are getting for 'dfs.namenode.http-address'. Please repeat the same step again and this time use http property:

/var/lib/ambari-server/resources/scripts/configs.sh -u AMBARI_USER -p AMBARI_PASS set AMBARI_HOST_NAME CLUSTER_NAME hdfs-site dfs.namenode.http-address "abc.xyz.com:50070"

Remember the port is 50070 this time for http address.

@lraheja

Oh, silly me! That fixed everything and now my namenode is working!!!! Thank you sooo much for your help.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.