Support Questions

pacosoplas · ‎02-03-2016

Hi:

Why after restart the namenode i can see this retries??

2016-02-03 17:43:17,121 - Must wait to leave safemode since High Availability is not enabled.
2016-02-03 17:43:17,121 - Checking the NameNode safemode status since may need to transition from ON to OFF.
2016-02-03 17:43:17,122 - Execute['hdfs dfsadmin -fs hdfs://lnxbig05.cajarural.gcr:8020 -safemode get | grep 'Safe mode is OFF''] {'logoutput': True, 'tries': 180, 'user': 'hdfs', 'try_sleep': 10}
2016-02-03 17:43:20,088 - Retrying after 10 seconds. Reason: Execution of 'hdfs dfsadmin -fs hdfs://lnxbig05.cajarural.gcr:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. 
2016-02-03 17:43:32,396 - Retrying after 10 seconds. Reason: Execution of 'hdfs dfsadmin -fs hdfs://lnxbig05.cajarural.gcr:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. 
2016-02-03 17:43:44,761 - Retrying after 10 seconds. Reason: Execution of 'hdfs dfsadmin -fs hdfs://lnxbig05.cajarural.gcr:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. 
2016-02-03 17:43:57,370 - Retrying after 10 seconds. Reason: Execution of 'hdfs dfsadmin -fs hdfs://lnxbig05.cajarural.gcr:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. 
2016-02-03 17:44:09,734 - Retrying after 10 seconds. Reason: Execution of 'hdfs dfsadmin -fs hdfs://lnxbig05.cajarural.gcr:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. 
2016-02-03 17:44:22,049 - Retrying after 10 seconds. Reason: Execution of 'hdfs dfsadmin -fs hdfs://lnxbig05.cajarural.gcr:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. 
2016-02-03 17:44:34,350 - Retrying after 10 seconds. Reason: Execution of 'hdfs dfsadmin -fs hdfs://lnxbig05.cajarural.gcr:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. 
2016-02-03 17:44:46,675 - Retrying after 10 seconds. Reason: Execution of 'hdfs dfsadmin -fs hdfs://lnxbig05.cajarural.gcr:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. 
2016-02-03 17:44:59,021 - Retrying after 10 seconds. Reason: Execution of 'hdfs dfsadmin -fs hdfs://lnxbig05.cajarural.gcr:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. 
Safe mode is OFF
2016-02-03 17:45:11,442 - HdfsResource['/tmp'] {'security_enabled': False, 'only_if': None, 'keytab': [EMPTY], 'hadoop_bin_dir': '/usr/hdp/current/hadoop-client/bin', 'default_fs': 'hdfs://lnxbig05.cajarural.gcr:8020', 'hdfs_site': ..., 'kinit_path_local': 'kinit', 'principal_name': None, 'user': 'hdfs', 'owner': 'hdfs', 'hadoop_conf_dir': '/usr/hdp/current/hadoop-client/conf', 'type': 'directory', 'action': ['create_on_execute'], 'mode': 0777}
2016-02-03 17:45:11,445 - checked_call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -sS -L -w '"'"'%{http_code}'"'"' -X GET '"'"'http://lnxbig05.cajarural.gcr:50070/webhdfs/v1/tmp?op=GETFILESTATUS&user.name=hdfs'"'"' 1>/tmp/tmpP7WEHk 2>/tmp/tmphm0eeD''] {'logoutput': None, 'quiet': False}
2016-02-03 17:45:13,273 - checked_call returned (0, '')
2016-02-03 17:45:13,274 - HdfsResource['/user/ambari-qa'] {'security_enabled': False, 'only_if': None, 'keytab': [EMPTY], 'hadoop_bin_dir': '/usr/hdp/current/hadoop-client/bin', 'default_fs': 'hdfs://lnxbig05.cajarural.gcr:8020', 'hdfs_site': ..., 'kinit_path_local': 'kinit', 'principal_name': None, 'user': 'hdfs', 'owner': 'ambari-qa', 'hadoop_conf_dir': '/usr/hdp/current/hadoop-client/conf', 'type': 'directory', 'action': ['create_on_execute'], 'mode': 0770}
2016-02-03 17:45:13,276 - checked_call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -sS -L -w '"'"'%{http_code}'"'"' -X GET '"'"'http://lnxbig05.cajarural.gcr:50070/webhdfs/v1/user/ambari-qa?op=GETFILESTATUS&user.name=hdfs'"'"' 1>/tmp/tmpF1IxAv 2>/tmp/tmpkMhn6U''] {'logoutput': None, 'quiet': False}
2016-02-03 17:45:13,391 - checked_call returned (0, '')
2016-02-03 17:45:13,392 - HdfsResource[None] {'security_enabled': False, 'only_if': None, 'keytab': [EMPTY], 'hadoop_bin_dir': '/usr/hdp/current/hadoop-client/bin', 'default_fs': 'hdfs://lnxbig05.cajarural.gcr:8020', 'hdfs_site': ..., 'kinit_path_local': 'kinit', 'principal_name': None, 'user': 'hdfs', 'action': ['execute'], 'hadoop_conf_dir': '/usr/hdp/current/hadoop-client/conf'}

nsabharwal · ‎02-03-2016

@Roberto Sancho

This is normal and during restart Namode does the checkpoints for the metadata sanity check.

View solution in original post

nsabharwal · ‎02-03-2016

@Roberto Sancho

This is normal and during restart Namode does the checkpoints for the metadata sanity check.

nsabharwal · ‎02-03-2016

@Roberto Sancho Good read http://hortonworks.com/blog/understanding-namenode-startup-operations-in-hdfs/

aervits · ‎02-03-2016

in production, you let it finish, in sandbox you can force to exit safemode by issuing the command below, it's completely normal to see this, just let it finish @Roberto Sancho

hdfs dfsadmin -safemode leave

ashneesharma88 · ‎02-29-2016

Namenode enters in safe node automatically after restart the namenode services, becuase

1)It loads the file system namespace from the last saves fsimage into it's main memory and the edits logs.

2) Applies edit log files on fsimage and result in a new file system namespace.

3) It receive block reports containing information about block location from all data nodes.

and it's normal process.

lyang2017 · ‎05-12-2019

I also occur it...Did you have solve it?? I tried the way on the network, but it invalid....so I don't know how to solve it ..it occurs twice in production.

gul_shad · ‎05-13-2019

@Yan Liu this isn't an issue to worry about. Ambari keeps checking the NN status until it detects that the NN is out of safemode. It usually takes sometime because of the reasons mentioned above.

gul_shad · ‎05-12-2019

Because Ambari keeps trying to get the safe mode status of namenode. And when it detects that the safe mode is OFF (grep 'Safe mode is OFF'), it shows Namenode started.

Namenode usually takes some time to come out of safemode, Because:

During Namenode startup, the NameNode must complete certain actions before it can serve client requests:

Read file system metadata from the fsimage file.
Read edit logs and apply logged operations to the file system metadata.
Write a new checkpoint (a new fsimage consisting of the prior fsimage plus the application of all operations from the edit logs).
Remain in safe mode until a sufficient number of blocks have been reported by datanodes.

In some situations, these actions can take a long time to complete.