Created 09-05-2018 08:24 AM
Hi All
I have installed HDP 2.6 in RHEL 7.4 in Azure.
The installation process is completed. But when I started the cluster Name node is going in safe mode. Due to that other services are not able to come up..
I tried to manually exit from safe-mode, and tried restarting the services. its all starting fine.
Any suggestions.
Thanks in advance
Muthu
Error log :
2018-09-05 12:04:03,445 - Retrying after 10 seconds. Reason: Execution of '/usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin -fs hdfs://bdm.localdomain:8020 -safemode get | grep 'Safe mode is OFF'' returned 1.
2018-09-05 12:04:15,700 - Retrying after 10 seconds. Reason: Execution of '/usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin -fs hdfs://bdm.localdomain:8020 -safemode get | grep 'Safe mode is OFF'' returned 1.
2018-09-05 12:04:27,959 - Retrying after 10 seconds. Reason: Execution of '/usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin -fs hdfs://bdm.localdomain:8020 -safemode get | grep 'Safe mode is OFF'' returned 1.
2018-09-05 12:04:40,239 - Retrying after 10 seconds. Reason: Execution of '/usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin -fs hdfs://bdm.localdomain:8020 -safemode get | grep 'Safe mode is OFF'' returned 1.
2018-09-05 12:04:52,457 - The NameNode is still in Safemode. Please be careful with commands that need Safemode OFF.
2018-09-05 12:04:52,458 - HdfsResource['/tmp'] {'security_enabled': False, 'hadoop_bin_dir': '/usr/hdp/current/hadoop-client/bin', 'keytab': [EMPTY], 'dfs_type': '', 'default_fs': 'hdfs://bdm.localdomain:8020', 'hdfs_resource_ignore_file': '/var/lib/ambari-agent/data/.hdfs_resource_ignore', 'hdfs_site': ..., 'kinit_path_local': 'kinit', 'principal_name': None, 'user': 'hdfs', 'owner': 'hdfs', 'hadoop_conf_dir': '/usr/hdp/current/hadoop-client/conf', 'type': 'directory', 'action': ['create_on_execute'], 'immutable_paths': [u'/apps/hive/warehouse', u'/mr-history/done', u'/app-logs', u'/tmp'], 'mode': 0777}
2018-09-05 12:04:52,461 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -sS -L -w '"'"'%{http_code}'"'"' -X GET '"'"'http://bdm.localdomain:50070/webhdfs/v1/tmp?op=GETFILESTATUS&user.name=hdfs'"'"' 1>/tmp/tmpE8fWXX 2>/tmp/tmptJfEmS''] {'logoutput': None, 'quiet': False}
2018-09-05 12:04:53,922 - call returned (0, '')
2018-09-05 12:04:53,923 - Skipping the operation for not managed DFS directory /tmp since immutable_paths contains it.
2018-09-05 12:04:53,924 - HdfsResource['/user/ambari-qa'] {'security_enabled': False, 'hadoop_bin_dir': '/usr/hdp/current/hadoop-client/bin', 'keytab': [EMPTY], 'dfs_type': '', 'default_fs': 'hdfs://bdm.localdomain:8020', 'hdfs_resource_ignore_file': '/var/lib/ambari-agent/data/.hdfs_resource_ignore', 'hdfs_site': ..., 'kinit_path_local': 'kinit', 'principal_name': None, 'user': 'hdfs', 'owner': 'ambari-qa', 'hadoop_conf_dir': '/usr/hdp/current/hadoop-client/conf', 'type': 'directory', 'action': ['create_on_execute'], 'immutable_paths': [u'/apps/hive/warehouse', u'/mr-history/done', u'/app-logs', u'/tmp'], 'mode': 0770}
2018-09-05 12:04:53,926 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -sS -L -w '"'"'%{http_code}'"'"' -X GET '"'"'http://bdm.localdomain:50070/webhdfs/v1/user/ambari-qa?op=GETFILESTATUS&user.name=hdfs'"'"' 1>/tmp/tmpujRH4r 2>/tmp/tmp4594S5''] {'logoutput': None, 'quiet': False}
2018-09-05 12:04:53,998 - call returned (0, '')
2018-09-05 12:04:53,999 - HdfsResource[None] {'security_enabled': False, 'hadoop_bin_dir': '/usr/hdp/current/hadoop-client/bin', 'keytab': [EMPTY], 'dfs_type': '', 'default_fs': 'hdfs://bdm.localdomain:8020', 'hdfs_resource_ignore_file': '/var/lib/ambari-agent/data/.hdfs_resource_ignore', 'hdfs_site': ..., 'kinit_path_local': 'kinit', 'principal_name': None, 'user': 'hdfs', 'action': ['execute'], 'hadoop_conf_dir': '/usr/hdp/current/hadoop-client/conf', 'immutable_paths': [u'/apps/hive/warehouse', u'/mr-history/done', u'/app-logs', u'/tmp']}
2018-09-05 12:04:53,999 - Ranger Hdfs plugin is not enabled
Command completed successfully!
Created 09-06-2018 05:52 AM
I did some analysis with and found that few blocks are corrupted in HDFS. even tried to delete the corrupted files. once it is done. It worked fine for few hours.
The moment i restart the server, my NN is coming again. in safe mode..
Since NN is in safe mode.. other services are not coming up..
Any suggestions.
Thx
Muthu
Created 09-06-2018 07:07 AM
Namenode will be in safemode until it receives the specified percentage(dfs.namenode.safemode.threshold-pct=0.999f) of blocks that should satisfy minimal replication and it should be reported to namenode.
In your case, Namenode still waiting for block report from datanodes. Please ensure that all datanodes are up and running, and check if datanode is sending block report.
Addition, Check how many blocks so far reported to namenode?
ie. The reported blocks 71 needs additional 17 blocks to reach the threshold 1.0000 of total blocks 87.
Created 10-04-2018 06:17 AM
hi Karthick
I resolved this issue by formatting the NN node since i found few corrupted blocks in hdfs. mine is single node cluster.
Formatting resolved the issue.
Thx
Muthu
Created 10-04-2018 07:41 AM
Formatting is not an ideal option to solve this issue. In this case, you lost all your data.