Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Safe mode is OFF

SOLVED Go to solution
Highlighted

Safe mode is OFF

Super Collaborator

Hi:

Why after restart the namenode i can see this retries??

2016-02-03 17:43:17,121 - Must wait to leave safemode since High Availability is not enabled.
2016-02-03 17:43:17,121 - Checking the NameNode safemode status since may need to transition from ON to OFF.
2016-02-03 17:43:17,122 - Execute['hdfs dfsadmin -fs hdfs://lnxbig05.cajarural.gcr:8020 -safemode get | grep 'Safe mode is OFF''] {'logoutput': True, 'tries': 180, 'user': 'hdfs', 'try_sleep': 10}
2016-02-03 17:43:20,088 - Retrying after 10 seconds. Reason: Execution of 'hdfs dfsadmin -fs hdfs://lnxbig05.cajarural.gcr:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. 
2016-02-03 17:43:32,396 - Retrying after 10 seconds. Reason: Execution of 'hdfs dfsadmin -fs hdfs://lnxbig05.cajarural.gcr:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. 
2016-02-03 17:43:44,761 - Retrying after 10 seconds. Reason: Execution of 'hdfs dfsadmin -fs hdfs://lnxbig05.cajarural.gcr:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. 
2016-02-03 17:43:57,370 - Retrying after 10 seconds. Reason: Execution of 'hdfs dfsadmin -fs hdfs://lnxbig05.cajarural.gcr:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. 
2016-02-03 17:44:09,734 - Retrying after 10 seconds. Reason: Execution of 'hdfs dfsadmin -fs hdfs://lnxbig05.cajarural.gcr:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. 
2016-02-03 17:44:22,049 - Retrying after 10 seconds. Reason: Execution of 'hdfs dfsadmin -fs hdfs://lnxbig05.cajarural.gcr:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. 
2016-02-03 17:44:34,350 - Retrying after 10 seconds. Reason: Execution of 'hdfs dfsadmin -fs hdfs://lnxbig05.cajarural.gcr:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. 
2016-02-03 17:44:46,675 - Retrying after 10 seconds. Reason: Execution of 'hdfs dfsadmin -fs hdfs://lnxbig05.cajarural.gcr:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. 
2016-02-03 17:44:59,021 - Retrying after 10 seconds. Reason: Execution of 'hdfs dfsadmin -fs hdfs://lnxbig05.cajarural.gcr:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. 
Safe mode is OFF
2016-02-03 17:45:11,442 - HdfsResource['/tmp'] {'security_enabled': False, 'only_if': None, 'keytab': [EMPTY], 'hadoop_bin_dir': '/usr/hdp/current/hadoop-client/bin', 'default_fs': 'hdfs://lnxbig05.cajarural.gcr:8020', 'hdfs_site': ..., 'kinit_path_local': 'kinit', 'principal_name': None, 'user': 'hdfs', 'owner': 'hdfs', 'hadoop_conf_dir': '/usr/hdp/current/hadoop-client/conf', 'type': 'directory', 'action': ['create_on_execute'], 'mode': 0777}
2016-02-03 17:45:11,445 - checked_call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -sS -L -w '"'"'%{http_code}'"'"' -X GET '"'"'http://lnxbig05.cajarural.gcr:50070/webhdfs/v1/tmp?op=GETFILESTATUS&user.name=hdfs'"'"' 1>/tmp/tmpP7WEHk 2>/tmp/tmphm0eeD''] {'logoutput': None, 'quiet': False}
2016-02-03 17:45:13,273 - checked_call returned (0, '')
2016-02-03 17:45:13,274 - HdfsResource['/user/ambari-qa'] {'security_enabled': False, 'only_if': None, 'keytab': [EMPTY], 'hadoop_bin_dir': '/usr/hdp/current/hadoop-client/bin', 'default_fs': 'hdfs://lnxbig05.cajarural.gcr:8020', 'hdfs_site': ..., 'kinit_path_local': 'kinit', 'principal_name': None, 'user': 'hdfs', 'owner': 'ambari-qa', 'hadoop_conf_dir': '/usr/hdp/current/hadoop-client/conf', 'type': 'directory', 'action': ['create_on_execute'], 'mode': 0770}
2016-02-03 17:45:13,276 - checked_call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -sS -L -w '"'"'%{http_code}'"'"' -X GET '"'"'http://lnxbig05.cajarural.gcr:50070/webhdfs/v1/user/ambari-qa?op=GETFILESTATUS&user.name=hdfs'"'"' 1>/tmp/tmpF1IxAv 2>/tmp/tmpkMhn6U''] {'logoutput': None, 'quiet': False}
2016-02-03 17:45:13,391 - checked_call returned (0, '')
2016-02-03 17:45:13,392 - HdfsResource[None] {'security_enabled': False, 'only_if': None, 'keytab': [EMPTY], 'hadoop_bin_dir': '/usr/hdp/current/hadoop-client/bin', 'default_fs': 'hdfs://lnxbig05.cajarural.gcr:8020', 'hdfs_site': ..., 'kinit_path_local': 'kinit', 'principal_name': None, 'user': 'hdfs', 'action': ['execute'], 'hadoop_conf_dir': '/usr/hdp/current/hadoop-client/conf'}
1 ACCEPTED SOLUTION

Accepted Solutions

Re: Safe mode is OFF

@Roberto Sancho

This is normal and during restart Namode does the checkpoints for the metadata sanity check.

7 REPLIES 7

Re: Safe mode is OFF

@Roberto Sancho

This is normal and during restart Namode does the checkpoints for the metadata sanity check.

Re: Safe mode is OFF

Re: Safe mode is OFF

Mentor

in production, you let it finish, in sandbox you can force to exit safemode by issuing the command below, it's completely normal to see this, just let it finish @Roberto Sancho

hdfs dfsadmin -safemode leave

Re: Safe mode is OFF

Namenode enters in safe node automatically after restart the namenode services, becuase

1)It loads the file system namespace from the last saves fsimage into it's main memory and the edits logs.

2) Applies edit log files on fsimage and result in a new file system namespace.

3) It receive block reports containing information about block location from all data nodes.

and it's normal process.

Re: Safe mode is OFF

New Contributor

I also occur it...Did you have solve it?? I tried the way on the network, but it invalid....so I don't know how to solve it ..it occurs twice in production.

Re: Safe mode is OFF

Contributor

@Yan Liu this isn't an issue to worry about. Ambari keeps checking the NN status until it detects that the NN is out of safemode. It usually takes sometime because of the reasons mentioned above.

Re: Safe mode is OFF

Contributor

Because Ambari keeps trying to get the safe mode status of namenode. And when it detects that the safe mode is OFF (grep 'Safe mode is OFF'), it shows Namenode started.


Namenode usually takes some time to come out of safemode, Because:

During Namenode startup, the NameNode must complete certain actions before it can serve client requests:

  1. Read file system metadata from the fsimage file.
  2. Read edit logs and apply logged operations to the file system metadata.
  3. Write a new checkpoint (a new fsimage consisting of the prior fsimage plus the application of all operations from the edit logs).
  4. Remain in safe mode until a sufficient number of blocks have been reported by datanodes.

In some situations, these actions can take a long time to complete.