Created 10-15-2017 07:12 PM
I have setup single node HDP2.6 deployed using Ambari.
My single node hostname is nhknox-Virtual-Machine.mad.lab
I have KDC server setup at Domain Controller as Active Directory Domain Services.
Now I am enabling Kerberos, which is going through fine. Then, on ambari wizard it stopping all the services, which is fine too. Then, it is trying to start all the services after kerberized hadoop.
There my Namenode is not starting and below are the logs for the same. Please suggest for the solution
stderr:
stdout:
2017-10-15 13:58:14,561 - Execute['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ; /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start namenode''] {'environment': {'HADOOP_LIBEXEC_DIR': '/usr/hdp/current/hadoop-client/libexec'}, 'not_if': 'ambari-sudo.sh -H -E test -f /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid && ambari-sudo.sh -H -E pgrep -F /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid'}
2017-10-15 13:58:18,931 - Execute['/usr/bin/kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs-testknox@MAD.LAB'] {'user': 'hdfs'}
2017-10-15 13:58:19,157 - Waiting for this NameNode to leave Safemode due to the following conditions: HA: False, isActive: True, upgradeType: None
2017-10-15 13:58:19,157 - Waiting up to 19 minutes for the NameNode to leave Safemode...
2017-10-15 13:58:19,158 - Execute['/usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin -fs hdfs://nhknox-virtual-machine.mad.lab:8020 -safemode get | grep 'Safe mode is OFF''] {'logoutput': True, 'tries': 115, 'user': 'hdfs', 'try_sleep': 10}
safemode: Call From nhknox-virtual-machine.mad.lab/127.0.1.1 to nhknox-virtual-machine.mad.lab:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
2017-10-15 13:58:44,683 - Retrying after 10 seconds. Reason: Execution of '/usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin -fs hdfs://nhknox-virtual-machine.mad.lab:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. safemode: Call From nhknox-virtual-machine.mad.lab/127.0.1.1 to nhknox-virtual-machine.mad.lab:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
2017-10-15 13:59:16,062 - Retrying after 10 seconds. Reason: Execution of '/usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin -fs hdfs://nhknox-virtual-machine.mad.lab:8020 -safemode get | grep 'Safe mode is OFF'' returned 1.
2017-10-15 13:59:45,987 - Retrying after 10 seconds. Reason: Execution of '/usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin -fs hdfs://nhknox-virtual-machine.mad.lab:8020 -safemode get | grep 'Safe mode is OFF'' returned 1.
2017-10-15 14:00:15,016 - Retrying after 10 seconds. Reason: Execution of '/usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin -fs hdfs://nhknox-virtual-machine.mad.lab:8020 -safemode get | grep 'Safe mode is OFF'' returned 1.
Created 10-16-2017 02:16 AM
Is your hostname exactly "nhknox-Virtual-Machine.mad.lab" or "nhknox-virtual-machine.mad.lab". We have seen issues where hostname in Mixed case has caused issues. If that is the case for you, change the hostname to all small letters and restart ambari server and then try restarting the namenode.
If your hostname has all small letters, can you please paste your namenode logs under (/var/log/hadoop-hdfs/hdfs)
Thanks,
Aditya
Created 10-15-2017 07:20 PM
The problem is that your HDFS is Safe mode do this to resolve the issue
As root switch user to hdfs
# su - hdfs
Then
# hdfs dfsadmin -safemode leave
Then try restarting the namenode it should start successfully
Created 10-15-2017 07:31 PM
This gives me below error on terminal :
safemode: Call From nhknox-virtual-machine.mad.lab/127.0.1.1 to nhknox-virtual-machine.mad.lab:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:
Created 10-15-2017 07:42 PM
Can you copy me the commands you used, just to be sure.
Is it a single node cluster? And always include the details to error stack .
Created 10-15-2017 08:03 PM
Since I am running ambari-server in non-root mode I ran below command for leaving safemode:
sudo -u hdfs hdfs dfsadmin -safemode leave
And this got me the error "Connection Refused" error :
safemode: Call From nhknox-virtual-machine.mad.lab/127.0.1.1 to nhknox-virtual-machine.mad.lab:8020 failed on connection exception: java.net.ConnectException: Connection refused
Yes, this is a single node cluster as I mentioned in my starting post. I have pasted the error log which I am seeing while Namenode getting restarted in the starting post.
Please let me know what more details I need to paste.
**Note: I also tried "telnet localhost 8020" or "telnet server-hostname 8020" and "netstat -atn | grep 8020" .....it seems 8020 port is not Listening mode.
Created 10-15-2017 08:16 PM
I think I have seen a typo error in your command can you copy and paste the below command
sudo su - hdfs hdfs dfsadmin -safemode leave
And let me know
Created 10-15-2017 08:22 PM
I copied and pasted this ->
sudo su - hdfs hdfs dfsadmin -safemode leave
It says "cannot execute afemode"
Created 10-15-2017 09:11 PM
If you have sudo privileges just do this you will need a valid kerberos ticket just realised you have a kerberized cluster
$sudo su - hdfs
List the correct principal for hdfs
$ klist -kt /etc/security/keytabs/hdfs.headless.keytab
Keytab name: FILE:/etc/security/keytabs/hdfs.headless.keytab
KVNO Timestamp Principal
---- ------------------- ------------------------------------------------------
1 08/24/2017 15:42:23 hdfs-cluster@YOUR_REALM
Then grab a valid kerbero ticket
$ kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs-cluster@YOUR_REALM
Then
$ hdfs dfsadmin -safemode leave
That should work now !
Created 10-16-2017 02:16 AM
Is your hostname exactly "nhknox-Virtual-Machine.mad.lab" or "nhknox-virtual-machine.mad.lab". We have seen issues where hostname in Mixed case has caused issues. If that is the case for you, change the hostname to all small letters and restart ambari server and then try restarting the namenode.
If your hostname has all small letters, can you please paste your namenode logs under (/var/log/hadoop-hdfs/hdfs)
Thanks,
Aditya
Created 10-16-2017 03:53 AM
Thanks, @Aditya Sirna, hostname fix worked for me. All services started, but I am getting few alerts
1) HDFS; alert name=NameNode Blocks Health:
Total Block:[11], Missing Blocks[11]
2) YARN; alert name=NodeManager Health:
Connection refused for port 8042
3) YARN; alert name=NodeManager Web UI:
Connection refused for port 8042
4) YARN; alert name=Percent NodeManagers Available
affected:[1], total:[1]
Do you have any suggestions?