Support Questions

Find answers, ask questions, and share your expertise

Safe mode is OFF

avatar
Master Collaborator

Hi:

Why after restart the namenode i can see this retries??

2016-02-03 17:43:17,121 - Must wait to leave safemode since High Availability is not enabled.
2016-02-03 17:43:17,121 - Checking the NameNode safemode status since may need to transition from ON to OFF.
2016-02-03 17:43:17,122 - Execute['hdfs dfsadmin -fs hdfs://lnxbig05.cajarural.gcr:8020 -safemode get | grep 'Safe mode is OFF''] {'logoutput': True, 'tries': 180, 'user': 'hdfs', 'try_sleep': 10}
2016-02-03 17:43:20,088 - Retrying after 10 seconds. Reason: Execution of 'hdfs dfsadmin -fs hdfs://lnxbig05.cajarural.gcr:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. 
2016-02-03 17:43:32,396 - Retrying after 10 seconds. Reason: Execution of 'hdfs dfsadmin -fs hdfs://lnxbig05.cajarural.gcr:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. 
2016-02-03 17:43:44,761 - Retrying after 10 seconds. Reason: Execution of 'hdfs dfsadmin -fs hdfs://lnxbig05.cajarural.gcr:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. 
2016-02-03 17:43:57,370 - Retrying after 10 seconds. Reason: Execution of 'hdfs dfsadmin -fs hdfs://lnxbig05.cajarural.gcr:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. 
2016-02-03 17:44:09,734 - Retrying after 10 seconds. Reason: Execution of 'hdfs dfsadmin -fs hdfs://lnxbig05.cajarural.gcr:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. 
2016-02-03 17:44:22,049 - Retrying after 10 seconds. Reason: Execution of 'hdfs dfsadmin -fs hdfs://lnxbig05.cajarural.gcr:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. 
2016-02-03 17:44:34,350 - Retrying after 10 seconds. Reason: Execution of 'hdfs dfsadmin -fs hdfs://lnxbig05.cajarural.gcr:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. 
2016-02-03 17:44:46,675 - Retrying after 10 seconds. Reason: Execution of 'hdfs dfsadmin -fs hdfs://lnxbig05.cajarural.gcr:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. 
2016-02-03 17:44:59,021 - Retrying after 10 seconds. Reason: Execution of 'hdfs dfsadmin -fs hdfs://lnxbig05.cajarural.gcr:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. 
Safe mode is OFF
2016-02-03 17:45:11,442 - HdfsResource['/tmp'] {'security_enabled': False, 'only_if': None, 'keytab': [EMPTY], 'hadoop_bin_dir': '/usr/hdp/current/hadoop-client/bin', 'default_fs': 'hdfs://lnxbig05.cajarural.gcr:8020', 'hdfs_site': ..., 'kinit_path_local': 'kinit', 'principal_name': None, 'user': 'hdfs', 'owner': 'hdfs', 'hadoop_conf_dir': '/usr/hdp/current/hadoop-client/conf', 'type': 'directory', 'action': ['create_on_execute'], 'mode': 0777}
2016-02-03 17:45:11,445 - checked_call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -sS -L -w '"'"'%{http_code}'"'"' -X GET '"'"'http://lnxbig05.cajarural.gcr:50070/webhdfs/v1/tmp?op=GETFILESTATUS&user.name=hdfs'"'"' 1>/tmp/tmpP7WEHk 2>/tmp/tmphm0eeD''] {'logoutput': None, 'quiet': False}
2016-02-03 17:45:13,273 - checked_call returned (0, '')
2016-02-03 17:45:13,274 - HdfsResource['/user/ambari-qa'] {'security_enabled': False, 'only_if': None, 'keytab': [EMPTY], 'hadoop_bin_dir': '/usr/hdp/current/hadoop-client/bin', 'default_fs': 'hdfs://lnxbig05.cajarural.gcr:8020', 'hdfs_site': ..., 'kinit_path_local': 'kinit', 'principal_name': None, 'user': 'hdfs', 'owner': 'ambari-qa', 'hadoop_conf_dir': '/usr/hdp/current/hadoop-client/conf', 'type': 'directory', 'action': ['create_on_execute'], 'mode': 0770}
2016-02-03 17:45:13,276 - checked_call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -sS -L -w '"'"'%{http_code}'"'"' -X GET '"'"'http://lnxbig05.cajarural.gcr:50070/webhdfs/v1/user/ambari-qa?op=GETFILESTATUS&user.name=hdfs'"'"' 1>/tmp/tmpF1IxAv 2>/tmp/tmpkMhn6U''] {'logoutput': None, 'quiet': False}
2016-02-03 17:45:13,391 - checked_call returned (0, '')
2016-02-03 17:45:13,392 - HdfsResource[None] {'security_enabled': False, 'only_if': None, 'keytab': [EMPTY], 'hadoop_bin_dir': '/usr/hdp/current/hadoop-client/bin', 'default_fs': 'hdfs://lnxbig05.cajarural.gcr:8020', 'hdfs_site': ..., 'kinit_path_local': 'kinit', 'principal_name': None, 'user': 'hdfs', 'action': ['execute'], 'hadoop_conf_dir': '/usr/hdp/current/hadoop-client/conf'}
1 ACCEPTED SOLUTION

avatar
Master Mentor
@Roberto Sancho

This is normal and during restart Namode does the checkpoints for the metadata sanity check.

View solution in original post

7 REPLIES 7

avatar
Master Mentor
@Roberto Sancho

This is normal and during restart Namode does the checkpoints for the metadata sanity check.

avatar
Master Mentor

avatar
Master Mentor

in production, you let it finish, in sandbox you can force to exit safemode by issuing the command below, it's completely normal to see this, just let it finish @Roberto Sancho

hdfs dfsadmin -safemode leave

avatar

Namenode enters in safe node automatically after restart the namenode services, becuase

1)It loads the file system namespace from the last saves fsimage into it's main memory and the edits logs.

2) Applies edit log files on fsimage and result in a new file system namespace.

3) It receive block reports containing information about block location from all data nodes.

and it's normal process.

avatar
New Contributor

I also occur it...Did you have solve it?? I tried the way on the network, but it invalid....so I don't know how to solve it ..it occurs twice in production.

avatar
Expert Contributor

@Yan Liu this isn't an issue to worry about. Ambari keeps checking the NN status until it detects that the NN is out of safemode. It usually takes sometime because of the reasons mentioned above.

avatar
Expert Contributor

Because Ambari keeps trying to get the safe mode status of namenode. And when it detects that the safe mode is OFF (grep 'Safe mode is OFF'), it shows Namenode started.


Namenode usually takes some time to come out of safemode, Because:

During Namenode startup, the NameNode must complete certain actions before it can serve client requests:

  1. Read file system metadata from the fsimage file.
  2. Read edit logs and apply logged operations to the file system metadata.
  3. Write a new checkpoint (a new fsimage consisting of the prior fsimage plus the application of all operations from the edit logs).
  4. Remain in safe mode until a sufficient number of blocks have been reported by datanodes.

In some situations, these actions can take a long time to complete.