Support Questions
Find answers, ask questions, and share your expertise

NameNode keeps going down

Hi all,

I am having a problem with the NameNode status ambari shows. The following points are verifiable in the system: - The NameNode keeps going down a few seconds after I start it through ambari (it looks like it never really goes up, but the start process run successfully);

- Despite being DOWN according to ambari, if I run JPS in the server the NameNode is hosted it shows that the service is running:

[hdfs@RHTPINEC008 ~]$ jps
39395 NameNode
4463 Jps

and I can access NameNode UI properly;

- I already restarted both the namenode and ambari-agent the manually but the behavior keeps the same;

- This problem started after some HBase/Phoenix heavy queries that caused the namenode to go down (not sure if this is actually related but the exact same configurations were working well before this episode);

- I've been digging for some hours and I am not being able to find error details in the namenode logs nor in the ambari-agent logs that allows me to understand the problem;

I am using hdp 2.4.0 and no HA options.

Can someone help in this?

Thanks in advance

28 REPLIES 28

@Geoffrey Shelton Okot

sorry i work till 2PM EST thats why delay in answering. I am using AD and users already got created in the AD before HDP installation Yes One way trust made .

hostName=node1.test.co

Contents of /etc/krb5.conf :

includedir /etc/krb5.conf.d/

includedir /var/lib/sss/pubconf/krb5.include.d/
[logging]
default = FILE:/var/log/krb5libs.log
kdc = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log

[libdefaults]
dns_lookup_realm = false
ticket_lifetime = 24h
renew_lifetime = 7d
forwardable = true
rdns = false
# default_realm = EXAMPLE.COM
default_ccache_name = KEYRING:persistent:%{uid}

default_realm = TEST.CO
[realms]
# EXAMPLE.COM = {
# kdc = kerberos.example.com
# admin_server = kerberos.example.com
# }

TEST.CO = {
}

[domain_realm]
# .example.com = EXAMPLE.COM
# example.com = EXAMPLE.COM
test.co = TEST.CO
.test.co = TEST.CO

Mentor

@Subramanian Govindasamy

When using MIT KDC there are 3 important files that MUST be set correctly for Kerberos to function. Their locations might vary depending on the OS.

/var/kerberos/krb5kdc/kdc.conf 
/var/kerberos/krb5kdc/kadm5.acl 
/etc/krb5.conf

I have seen a couple of issues in your krb5.conf. I have corrected it replace the {your_kdc_server} see below with your KDC FQDN

Back your current krb5.conf

# cp /etc/krb5.conf  /etc/krb5.conf.bak

The edit the /etc/krb5.conf delete the contents and replace them with the below

# vi /etc/krb5.conf

Paste

[libdefaults]
  renew_lifetime = 7d
  forwardable = true
  default_realm = TEST.CO
  dns_lookup_realm = false
  ticket_lifetime = 24h
  rdns = false
  default_ccache_name = KEYRING:persistent:%{uid}
[domain_realm]
 test.co = TEST.CO
.test.co = TEST.CO
[logging]
  default = FILE:/var/log/krb5libs.log
  kdc = FILE:/var/log/krb5kdc.log
  admin_server = FILE:/var/log/kadmind.log
[realms]
TEST.CO = {
  admin_server = {your_kdc_server}
  kdc = {your_kdc_server}
 }

/var/kerberos/krb5kdc/kdc.conf

[kdcdefaults]
 kdc_ports = 88
 kdc_tcp_ports = 88
[realms]
 TEST.CO = {
  #master_key_type = aes256-cts
  acl_file = /var/kerberos/krb5kdc/kadm5.acl
  dict_file = /usr/share/dict/words
  admin_keytab = /var/kerberos/krb5kdc/kadm5.keytab
  supported_enctypes = aes256-cts:normal aes128-cts:normal des3-hmac-sha1:normal arcfour-hmac:normal des-hmac-sha1:normal des-cbc-md5:normal des-cbc-crc:normal
 }

Your /var/kerberos/krb5kdc/kadm5.acl should look like this note the spacing for the last *

*/admin@TEST.CO  *

Restart the KDC daemons

# service /krb5kdc start 
# service kadmin start

Please correct the above files so we know the Kerberos is correctly set and revert with the new error if any.

@Geoffrey Shelton Okot

Thank you so much. we dont have KDC server installed , we are using LDAP. do i need to mention AD server in the place of "{your kdc server}?

 admin_server ={your_kdc_server}

Mentor

@Subramanian Govindasamy

If your ticket grantor is AD the YES replace it accordingly

@Geoffrey Shelton Okot

Thank you. will let you know the updates shortly.

Mentor

Explorer

Make sure you have odd numbers of JN. All the JNs are healthy.

@Geoffrey Shelton Okot

Sorry for the delay. Horton Work Support claimed that '@' symbol is not support, so we are doing reinstall freshly with local user and sync with LDAP.

Thanks

; ;