Support Questions

Find answers, ask questions, and share your expertise

cant start Standby NameNode

avatar

we are trying to start the Standby NameNode on master03 machines but withou success

from the error log we can see the follwing

but we cant capture what is the problem , please advice what chuld be the reason that namenode not started according to the follwing log

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 424, in <module>
    NameNode().execute()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 314, in execute
    method(env)
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 100, in start
    upgrade_suspended=params.upgrade_suspended, env=env)
  File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk
    return fn(*args, **kwargs)
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py", line 167, in namenode
    create_log_dir=True
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/utils.py", line 271, in service
    Execute(daemon_cmd, not_if=process_id_exists_command, environment=hadoop_env_exports)
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
    provider_action()
  File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run
    tries=self.resource.tries, try_sleep=self.resource.try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner
    result = function(command, **kwargs)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call
    tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper
    result = _call(command, **kwargs_copy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call
    raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ;  /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start namenode'' returned 1. starting namenode, logging to /var/log/hadoop/hdfs/hadoop-hdfs-namenode-master03.sys57.com.out
Michael-Bronson
1 ACCEPTED SOLUTION

avatar
Master Mentor

@Michael Bronson

If this error is somewhat related to the other thread that you posted recently: https://community.hortonworks.com/questions/168750/ambari-cluster-no-valid-image-files-found.html?ch...

Then please apply the same solution and close one of the thread.

Pasting the steps here:

Please check if the dfs.namenode.name.dir (default path: /hadoop/hdfs/namenode) directory is empty by any chance, due to disk issue the files are not present there.

If this is the case and the Active NameNode is already running (this must be true) then you can try the following:

Try running the following command:

# su - hdfs 
# hdfs namenode -bootstrapStandby <br>

NOTE: Please run this command ONLY on Standby NameNode. DO NOT run this command on Active NameNode. This command will try to recover all metadata on Standby NameNode.
.

- Now try to start Standby NameNode from Ambari
- Also please Restart ZKFailoverController from Ambari

.

View solution in original post

6 REPLIES 6

avatar

when I run it alone we get -

 su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ;  /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start namenode'
starting namenode, logging to /var/log/hadoop/hdfs/hadoop-hdfs-namenode-master03.sys57.com.out
echo $?
1
Michael-Bronson

avatar
Master Mentor

@Michael Bronson

If this error is somewhat related to the other thread that you posted recently: https://community.hortonworks.com/questions/168750/ambari-cluster-no-valid-image-files-found.html?ch...

Then please apply the same solution and close one of the thread.

Pasting the steps here:

Please check if the dfs.namenode.name.dir (default path: /hadoop/hdfs/namenode) directory is empty by any chance, due to disk issue the files are not present there.

If this is the case and the Active NameNode is already running (this must be true) then you can try the following:

Try running the following command:

# su - hdfs 
# hdfs namenode -bootstrapStandby <br>

NOTE: Please run this command ONLY on Standby NameNode. DO NOT run this command on Active NameNode. This command will try to recover all metadata on Standby NameNode.
.

- Now try to start Standby NameNode from Ambari
- Also please Restart ZKFailoverController from Ambari

.

avatar

@jay I run the hdfs namenode -bootstrapStandby on stand by but I get

Retrying connect to server: master01.sys57.com/100.4.3.21:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)

and that because both name node are down - I can start the name node on both machines

Michael-Bronson

avatar

@Jay so how to connue from this step?

Michael-Bronson

avatar

One NN should be active to run the bootstrap command. you need to bring the healthy NN up and running first.

avatar

can I run like - hdfs namenode -bootstrap.... on the active node , if yes then what is the complete syntax ?

Michael-Bronson