Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

NameNode failing to start from Ambari 2.5, but start at command line

NameNode failing to start from Ambari 2.5, but start at command line

I'm trying to start all HDP2.6 services via Ambari server 2.5.

Following services are failing to start from Ambari server, but I can start them manually via command prompt first, and then come back to Ambari server and start remaining services from UI. How do I fix the error so that I can start NameNode from Ambari server?

1. NameNode

2. SNameNode

3. NodeManager

I get following error at NameNode startup, how do I fix it?

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 367, in <module>
    NameNode().execute()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 329, in execute
    method(env)
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 100, in start
    upgrade_suspended=params.upgrade_suspended, env=env)
  File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk
    return fn(*args, **kwargs)
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py", line 167, in namenode
    create_log_dir=True
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/utils.py", line 274, in service
    Execute(daemon_cmd, not_if=process_id_exists_command, environment=hadoop_env_exports)
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
    provider_action()
  File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run
    tries=self.resource.tries, try_sleep=self.resource.try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner
    result = function(command, **kwargs)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call
    tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper
    result = _call(command, **kwargs_copy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call
    raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ;  /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start namenode'' returned 1. -bash: line 0: ulimit: core file size: cannot modify limit: Operation not permitted
starting namenode, logging to /var/log/hadoop/hdfs/hadoop-hdfs-namenode-vlmazgrpmaster.fisdev.local.out
3 REPLIES 3
Highlighted

Re: NameNode failing to start from Ambari 2.5, but start at command line

Super Mentor

@Winnie Philip

Can you please try adding the following entries inside the "/etc/security/limits.conf" on the problematic hosts and then try again

* soft core unlimited
* hard core unlimited

.

Highlighted

Re: NameNode failing to start from Ambari 2.5, but start at command line

I added above 2 lines and restarting all services:

I get below error at NameNode start:

2017-08-30 12:31:21,040 - Retrying after 10 seconds. Reason: Execution of '/usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin -fs hdfs://vlmazgrpmaster.fisdev.local:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. safemode: Call From vlmazgrpmaster.fisdev.local/10.7.192.112 to vlmazgrpmaster.fisdev.local:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
safemode: Call From vlmazgrpmaster.fisdev.local/10.7.192.112 to vlmazgrpmaster.fisdev.local:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
2017-08-30 12:31:35,415 - Retrying after 10 seconds. Reason: Execution of '/usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin -fs hdfs://vlmazgrpmaster.fisdev.local:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. safemode: Call From vlmazgrpmaster.fisdev.local/10.7.192.112 to vlmazgrpmaster.fisdev.local:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
Highlighted

Re: NameNode failing to start from Ambari 2.5, but start at command line

Super Mentor

@Winnie Philip

If this is a fresh cluster and you do not have much data on NameNode then you can refer to the following HCC thread to see if NN format (As a quick solution) helps quickly (this will cause data loss)

https://community.hortonworks.com/questions/37539/namenode-is-not-leaving-safemode-and-is-not-gettin...

.

Also it will be good to first check if the Hostname is correct and the 8020 is reachable ... there is no Network issue.

# nc -v  vlmazgrpmaster.fisdev.local 802

.

Also the /etc/hosts file has correct entry in all hosts and the following command on every host returns correct FQDN

# hostname -f

.

Don't have an account?
Coming from Hortonworks? Activate your account here