Support Questions

Find answers, ask questions, and share your expertise

NameNode not starting after enabling Kerberos on single node cluster

avatar
Contributor

I have setup single node HDP2.6 deployed using Ambari.

My single node hostname is nhknox-Virtual-Machine.mad.lab

I have KDC server setup at Domain Controller as Active Directory Domain Services.

Now I am enabling Kerberos, which is going through fine. Then, on ambari wizard it stopping all the services, which is fine too. Then, it is trying to start all the services after kerberized hadoop.

There my Namenode is not starting and below are the logs for the same. Please suggest for the solution

stderr:

stdout:
2017-10-15 13:58:14,561 - Execute['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ; /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start namenode''] {'environment': {'HADOOP_LIBEXEC_DIR': '/usr/hdp/current/hadoop-client/libexec'}, 'not_if': 'ambari-sudo.sh -H -E test -f /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid && ambari-sudo.sh -H -E pgrep -F /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid'}
2017-10-15 13:58:18,931 - Execute['/usr/bin/kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs-testknox@MAD.LAB'] {'user': 'hdfs'}
2017-10-15 13:58:19,157 - Waiting for this NameNode to leave Safemode due to the following conditions: HA: False, isActive: True, upgradeType: None
2017-10-15 13:58:19,157 - Waiting up to 19 minutes for the NameNode to leave Safemode...
2017-10-15 13:58:19,158 - Execute['/usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin -fs hdfs://nhknox-virtual-machine.mad.lab:8020 -safemode get | grep 'Safe mode is OFF''] {'logoutput': True, 'tries': 115, 'user': 'hdfs', 'try_sleep': 10}
safemode: Call From nhknox-virtual-machine.mad.lab/127.0.1.1 to nhknox-virtual-machine.mad.lab:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
2017-10-15 13:58:44,683 - Retrying after 10 seconds. Reason: Execution of '/usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin -fs hdfs://nhknox-virtual-machine.mad.lab:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. safemode: Call From nhknox-virtual-machine.mad.lab/127.0.1.1 to nhknox-virtual-machine.mad.lab:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
2017-10-15 13:59:16,062 - Retrying after 10 seconds. Reason: Execution of '/usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin -fs hdfs://nhknox-virtual-machine.mad.lab:8020 -safemode get | grep 'Safe mode is OFF'' returned 1.
2017-10-15 13:59:45,987 - Retrying after 10 seconds. Reason: Execution of '/usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin -fs hdfs://nhknox-virtual-machine.mad.lab:8020 -safemode get | grep 'Safe mode is OFF'' returned 1.
2017-10-15 14:00:15,016 - Retrying after 10 seconds. Reason: Execution of '/usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin -fs hdfs://nhknox-virtual-machine.mad.lab:8020 -safemode get | grep 'Safe mode is OFF'' returned 1.

1 ACCEPTED SOLUTION

avatar
Super Guru

@Neha G,

Is your hostname exactly "nhknox-Virtual-Machine.mad.lab" or "nhknox-virtual-machine.mad.lab". We have seen issues where hostname in Mixed case has caused issues. If that is the case for you, change the hostname to all small letters and restart ambari server and then try restarting the namenode.

If your hostname has all small letters, can you please paste your namenode logs under (/var/log/hadoop-hdfs/hdfs)

Thanks,

Aditya

View solution in original post

10 REPLIES 10

avatar
Master Mentor

@Neha G

The problem is that your HDFS is Safe mode do this to resolve the issue

As root switch user to hdfs

# su - hdfs

Then

# hdfs dfsadmin -safemode leave

Then try restarting the namenode it should start successfully

avatar
Contributor

@Geoffrey Shelton Okot

This gives me below error on terminal :

safemode: Call From nhknox-virtual-machine.mad.lab/127.0.1.1 to nhknox-virtual-machine.mad.lab:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:

avatar
Master Mentor

@Neha G

Can you copy me the commands you used, just to be sure.

Is it a single node cluster? And always include the details to error stack .


avatar
Contributor

@Geoffrey Shelton Okot

Since I am running ambari-server in non-root mode I ran below command for leaving safemode:

sudo -u hdfs hdfs dfsadmin -safemode leave

And this got me the error "Connection Refused" error :

safemode: Call From nhknox-virtual-machine.mad.lab/127.0.1.1 to nhknox-virtual-machine.mad.lab:8020 failed on connection exception: java.net.ConnectException: Connection refused

Yes, this is a single node cluster as I mentioned in my starting post. I have pasted the error log which I am seeing while Namenode getting restarted in the starting post.

Please let me know what more details I need to paste.

**Note: I also tried "telnet localhost 8020" or "telnet server-hostname 8020" and "netstat -atn | grep 8020" .....it seems 8020 port is not Listening mode.

avatar
Master Mentor

@Neha G

I think I have seen a typo error in your command can you copy and paste the below command

sudo su - hdfs hdfs dfsadmin -safemode leave

And let me know

avatar
Contributor

@Geoffrey Shelton Okot

I copied and pasted this ->

	sudo su - hdfs hdfs dfsadmin -safemode leave

It says "cannot execute afemode"

avatar
Master Mentor

@Neha G

If you have sudo privileges just do this you will need a valid kerberos ticket just realised you have a kerberized cluster


$sudo su - hdfs

List the correct principal for hdfs


$ klist -kt /etc/security/keytabs/hdfs.headless.keytab 
Keytab name: FILE:/etc/security/keytabs/hdfs.headless.keytab
KVNO Timestamp Principal
---- ------------------- ------------------------------------------------------
1 08/24/2017 15:42:23 hdfs-cluster@YOUR_REALM

Then grab a valid kerbero ticket

$ kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs-cluster@YOUR_REALM

Then

$  hdfs dfsadmin -safemode leave

That should work now !

avatar
Super Guru

@Neha G,

Is your hostname exactly "nhknox-Virtual-Machine.mad.lab" or "nhknox-virtual-machine.mad.lab". We have seen issues where hostname in Mixed case has caused issues. If that is the case for you, change the hostname to all small letters and restart ambari server and then try restarting the namenode.

If your hostname has all small letters, can you please paste your namenode logs under (/var/log/hadoop-hdfs/hdfs)

Thanks,

Aditya

avatar
Contributor

Thanks, @Aditya Sirna, hostname fix worked for me. All services started, but I am getting few alerts

1) HDFS; alert name=NameNode Blocks Health:

Total Block:[11], Missing Blocks[11]

2) YARN; alert name=NodeManager Health:

Connection refused for port 8042

3) YARN; alert name=NodeManager Web UI:

Connection refused for port 8042

4) YARN; alert name=Percent NodeManagers Available

affected:[1], total:[1]

Do you have any suggestions?