Support Questions

Find answers, ask questions, and share your expertise

NameNode failing after HA NameNode Rollback

avatar

Hi there,

I have a fresh installation of HDP 2.3.4 on a 5-node cluster. All of my services were running successfully, with statistics displayed in the widgets. I have not have any NameNode issues up til today.

Earlier today I started the "Enable NameNode HA" Wizard. It failed at the first step in the installation phase (I think it was the namenode) and retrying didn't work, but I wasn't able to move forward or back in the process so I left and followed https://docs.hortonworks.com/HDPDocuments/Ambari-2.1.2.1/bk_Ambari_Users_Guide/content/_how_to_roll_....

At the end of completing the entire guide (and I've now gone back and done the whole thing over in case I missed something), I started HDFS (step 1.2.13) and the operation failed for the NameNode. I have no idea what to do! Does anyone recognize this error?

Here is the output:

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 408, in <module>
    NameNode().execute()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute
    method(env)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 530, in restart
    self.start(env, upgrade_type=upgrade_type)
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 103, in start
    upgrade_suspended=params.upgrade_suspended, env=env)
  File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk
    return fn(*args, **kwargs)
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py", line 212, in namenode
    create_hdfs_directories(is_active_namenode_cmd)
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py", line 278, in create_hdfs_directories
    only_if=check
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
    provider_action()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 463, in action_create_on_execute
    self.action_delayed("create")
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 460, in action_delayed
    self.get_hdfs_resource_executor().action_delayed(action_name, self)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 246, in action_delayed
    main_resource.resource.security_enabled, main_resource.resource.logoutput)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 133, in __init__
    security_enabled, run_user)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/namenode_ha_utils.py", line 167, in get_property_for_active_namenode
    if INADDR_ANY in value and rpc_key in hdfs_site:
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/config_dictionary.py", line 81, in __getattr__
    raise Fail("Configuration parameter '" + self.name + "' was not found in configurations dictionary!")
resource_management.core.exceptions.Fail: Configuration parameter 'dfs.namenode.https-address' was not found in configurations dictionary!
1 ACCEPTED SOLUTION

avatar
Expert Contributor
@Savanna Endicott

This time you are getting same error but for different property. Previously you got for 'dfs.namenode.https-address' and now you are getting for 'dfs.namenode.http-address'. Please repeat the same step again and this time use http property:

/var/lib/ambari-server/resources/scripts/configs.sh -u AMBARI_USER -p AMBARI_PASS set AMBARI_HOST_NAME CLUSTER_NAME hdfs-site dfs.namenode.http-address "abc.xyz.com:50070"

Remember the port is 50070 this time for http address.

View solution in original post

4 REPLIES 4

avatar
Expert Contributor

@Savanna Endicott You can use below command to push the property to the cluster and then try to restart NN

/var/lib/ambari-server/resources/scripts/configs.sh -u AMBARI_USER -p AMBARI_PASS set AMBARI_HOST_NAME CLUSTER_NAME PROPERTY_FILE PROPERTY_NAME "VALUE"

In your case this would be :

/var/lib/ambari-server/resources/scripts/configs.sh -u AMBARI_USER -p AMBARI_PASS set AMBARI_HOST_NAME CLUSTER_NAME hdfs-site dfs.namenode.https-address "abc.xyz.com:50470"

Replace value according to your cluster specification.Where:

	AMBARI_USER - Your Ambari UI login user (Default admin)
	AMBARI_PASSWORD - Login user's password (Default admin)
	AMBARI_HOST_NAME - Your Amabri server host
	CLSUTER_NAME - Your cluster name  (Case Sensitive)
	abc.xyz.com - Namenode hostname.

avatar

Hi @lraheja, thanks for your response.

I ran the command you suggested, and the result is the same error. I had restarted all of my instances, and stopped all services again. Is there something else I could do in addition to this? It also prints out the following attempt ten times which is shown in the stdout for the operation, maybe that's the issue? I turned safemode off using the same command used to turn it off but with "leave".

2016-09-06 07:35:08,376 - Retrying after 10 seconds. Reason: Execution of '/usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin -fs hdfs://abc.xyz.com -safemode get | grep 'Safe mode is OFF'' returned 1. 

And here is the error again:

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 408, in <module>
    NameNode().execute()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute
    method(env)
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 103, in start
    upgrade_suspended=params.upgrade_suspended, env=env)
  File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk
    return fn(*args, **kwargs)
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py", line 212, in namenode
    create_hdfs_directories(is_active_namenode_cmd)
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py", line 278, in create_hdfs_directories
    only_if=check
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
    provider_action()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 463, in action_create_on_execute
    self.action_delayed("create")
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 460, in action_delayed
    self.get_hdfs_resource_executor().action_delayed(action_name, self)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 246, in action_delayed
    main_resource.resource.security_enabled, main_resource.resource.logoutput)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 135, in __init__
    security_enabled, run_user)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/namenode_ha_utils.py", line 167, in get_property_for_active_namenode
    if INADDR_ANY in value and rpc_key in hdfs_site:
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/config_dictionary.py", line 81, in __getattr__
    raise Fail("Configuration parameter '" + self.name + "' was not found in configurations dictionary!")
resource_management.core.exceptions.Fail: Configuration parameter 'dfs.namenode.http-address' was not found in configurations dictionary!

avatar
Expert Contributor
@Savanna Endicott

This time you are getting same error but for different property. Previously you got for 'dfs.namenode.https-address' and now you are getting for 'dfs.namenode.http-address'. Please repeat the same step again and this time use http property:

/var/lib/ambari-server/resources/scripts/configs.sh -u AMBARI_USER -p AMBARI_PASS set AMBARI_HOST_NAME CLUSTER_NAME hdfs-site dfs.namenode.http-address "abc.xyz.com:50070"

Remember the port is 50070 this time for http address.

avatar
@lraheja

Oh, silly me! That fixed everything and now my namenode is working!!!! Thank you sooo much for your help.