Created 01-24-2018 02:04 PM
we are trying to start the Standby NameNode on master03 machines but withou success
from the error log we can see the follwing
but we cant capture what is the problem , please advice what chuld be the reason that namenode not started according to the follwing log
Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 424, in <module> NameNode().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 314, in execute method(env) File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 100, in start upgrade_suspended=params.upgrade_suspended, env=env) File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk return fn(*args, **kwargs) File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py", line 167, in namenode create_log_dir=True File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/utils.py", line 271, in service Execute(daemon_cmd, not_if=process_id_exists_command, environment=hadoop_env_exports) File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run tries=self.resource.tries, try_sleep=self.resource.try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call raise ExecutionFailed(err_msg, code, out, err) resource_management.core.exceptions.ExecutionFailed: Execution of 'ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ; /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start namenode'' returned 1. starting namenode, logging to /var/log/hadoop/hdfs/hadoop-hdfs-namenode-master03.sys57.com.out
Created 01-24-2018 09:09 PM
If this error is somewhat related to the other thread that you posted recently: https://community.hortonworks.com/questions/168750/ambari-cluster-no-valid-image-files-found.html?ch...
Then please apply the same solution and close one of the thread.
Pasting the steps here:
Please check if the dfs.namenode.name.dir (default path: /hadoop/hdfs/namenode) directory is empty by any chance, due to disk issue the files are not present there.
If this is the case and the Active NameNode is already running (this must be true) then you can try the following:
Try running the following command:
# su - hdfs # hdfs namenode -bootstrapStandby <br>
NOTE: Please
run this command ONLY on Standby NameNode. DO NOT run this command on
Active NameNode. This command will try to recover all metadata on
Standby NameNode.
.
- Now try to start Standby NameNode from Ambari
- Also please Restart ZKFailoverController from Ambari
.
Created 01-24-2018 02:38 PM
when I run it alone we get -
su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ; /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start namenode' starting namenode, logging to /var/log/hadoop/hdfs/hadoop-hdfs-namenode-master03.sys57.com.out echo $? 1
Created 01-24-2018 09:09 PM
If this error is somewhat related to the other thread that you posted recently: https://community.hortonworks.com/questions/168750/ambari-cluster-no-valid-image-files-found.html?ch...
Then please apply the same solution and close one of the thread.
Pasting the steps here:
Please check if the dfs.namenode.name.dir (default path: /hadoop/hdfs/namenode) directory is empty by any chance, due to disk issue the files are not present there.
If this is the case and the Active NameNode is already running (this must be true) then you can try the following:
Try running the following command:
# su - hdfs # hdfs namenode -bootstrapStandby <br>
NOTE: Please
run this command ONLY on Standby NameNode. DO NOT run this command on
Active NameNode. This command will try to recover all metadata on
Standby NameNode.
.
- Now try to start Standby NameNode from Ambari
- Also please Restart ZKFailoverController from Ambari
.
Created 01-24-2018 09:29 PM
@jay I run the hdfs namenode -bootstrapStandby on stand by but I get
Retrying connect to server: master01.sys57.com/100.4.3.21:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
and that because both name node are down - I can start the name node on both machines
Created 01-24-2018 09:39 PM
@Jay so how to connue from this step?
Created 01-24-2018 10:32 PM
One NN should be active to run the bootstrap command. you need to bring the healthy NN up and running first.
Created 01-24-2018 10:42 PM
can I run like - hdfs namenode -bootstrap.... on the active node , if yes then what is the complete syntax ?