Support Questions

Find answers, ask questions, and share your expertise

NameNode keeps going down

avatar

Hi all,

I am having a problem with the NameNode status ambari shows. The following points are verifiable in the system: - The NameNode keeps going down a few seconds after I start it through ambari (it looks like it never really goes up, but the start process run successfully);

- Despite being DOWN according to ambari, if I run JPS in the server the NameNode is hosted it shows that the service is running:

[hdfs@RHTPINEC008 ~]$ jps
39395 NameNode
4463 Jps

and I can access NameNode UI properly;

- I already restarted both the namenode and ambari-agent the manually but the behavior keeps the same;

- This problem started after some HBase/Phoenix heavy queries that caused the namenode to go down (not sure if this is actually related but the exact same configurations were working well before this episode);

- I've been digging for some hours and I am not being able to find error details in the namenode logs nor in the ambari-agent logs that allows me to understand the problem;

I am using hdp 2.4.0 and no HA options.

Can someone help in this?

Thanks in advance

28 REPLIES 28

avatar
Master Mentor

avatar
@Geoffrey Shelton Okot

sorry i work till 2PM EST thats why delay in answering. I am using AD and users already got created in the AD before HDP installation Yes One way trust made .

hostName=node1.test.co

Contents of /etc/krb5.conf :

includedir /etc/krb5.conf.d/

includedir /var/lib/sss/pubconf/krb5.include.d/
[logging]
default = FILE:/var/log/krb5libs.log
kdc = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log

[libdefaults]
dns_lookup_realm = false
ticket_lifetime = 24h
renew_lifetime = 7d
forwardable = true
rdns = false
# default_realm = EXAMPLE.COM
default_ccache_name = KEYRING:persistent:%{uid}

default_realm = TEST.CO
[realms]
# EXAMPLE.COM = {
# kdc = kerberos.example.com
# admin_server = kerberos.example.com
# }

TEST.CO = {
}

[domain_realm]
# .example.com = EXAMPLE.COM
# example.com = EXAMPLE.COM
test.co = TEST.CO
.test.co = TEST.CO

avatar
Master Mentor

@Subramanian Govindasamy

When using MIT KDC there are 3 important files that MUST be set correctly for Kerberos to function. Their locations might vary depending on the OS.

/var/kerberos/krb5kdc/kdc.conf 
/var/kerberos/krb5kdc/kadm5.acl 
/etc/krb5.conf

I have seen a couple of issues in your krb5.conf. I have corrected it replace the {your_kdc_server} see below with your KDC FQDN

Back your current krb5.conf

# cp /etc/krb5.conf  /etc/krb5.conf.bak

The edit the /etc/krb5.conf delete the contents and replace them with the below

# vi /etc/krb5.conf

Paste

[libdefaults]
  renew_lifetime = 7d
  forwardable = true
  default_realm = TEST.CO
  dns_lookup_realm = false
  ticket_lifetime = 24h
  rdns = false
  default_ccache_name = KEYRING:persistent:%{uid}
[domain_realm]
 test.co = TEST.CO
.test.co = TEST.CO
[logging]
  default = FILE:/var/log/krb5libs.log
  kdc = FILE:/var/log/krb5kdc.log
  admin_server = FILE:/var/log/kadmind.log
[realms]
TEST.CO = {
  admin_server = {your_kdc_server}
  kdc = {your_kdc_server}
 }

/var/kerberos/krb5kdc/kdc.conf

[kdcdefaults]
 kdc_ports = 88
 kdc_tcp_ports = 88
[realms]
 TEST.CO = {
  #master_key_type = aes256-cts
  acl_file = /var/kerberos/krb5kdc/kadm5.acl
  dict_file = /usr/share/dict/words
  admin_keytab = /var/kerberos/krb5kdc/kadm5.keytab
  supported_enctypes = aes256-cts:normal aes128-cts:normal des3-hmac-sha1:normal arcfour-hmac:normal des-hmac-sha1:normal des-cbc-md5:normal des-cbc-crc:normal
 }

Your /var/kerberos/krb5kdc/kadm5.acl should look like this note the spacing for the last *

*/admin@TEST.CO  *

Restart the KDC daemons

# service /krb5kdc start 
# service kadmin start

Please correct the above files so we know the Kerberos is correctly set and revert with the new error if any.

avatar

@Geoffrey Shelton Okot

Thank you so much. we dont have KDC server installed , we are using LDAP. do i need to mention AD server in the place of "{your kdc server}?

 admin_server ={your_kdc_server}

avatar
Master Mentor

@Subramanian Govindasamy

If your ticket grantor is AD the YES replace it accordingly

avatar

@Geoffrey Shelton Okot

Thank you. will let you know the updates shortly.

avatar
Master Mentor

avatar
Contributor

Make sure you have odd numbers of JN. All the JNs are healthy.

avatar

@Geoffrey Shelton Okot

Sorry for the delay. Horton Work Support claimed that '@' symbol is not support, so we are doing reinstall freshly with local user and sync with LDAP.

Thanks