Support Questions
Find answers, ask questions, and share your expertise

Unkerberizing - NameNode service doesn't start

Unkerberizing - NameNode service doesn't start

New Contributor

I'm trying to unkerberize a cluster when during the service restart process the NameNode service is in an endless loop trying to restart. Below is the error. Does anybody know what I can do to get this going again? Thanks

2018-06-25 18:45:49,524 - Directory['/var/log/hadoop/hdfs'] {'owner': 'hdfs', 'group': 'hadoop', 'create_parents': True}
2018-06-25 18:45:49,524 - File['/var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid'] {'action': ['delete'], 'not_if': 'ambari-sudo.sh  -H -E test -f /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid && ambari-sudo.sh  -H -E pgrep -F /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid'}
2018-06-25 18:45:49,538 - Deleting File['/var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid']
2018-06-25 18:45:49,539 - Execute['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ;  /usr/hdp/2.6.3.0-235/hadoop/sbin/hadoop-daemon.sh --config /usr/hdp/2.6.3.0-235/hadoop/conf start namenode''] {'environment': {'HADOOP_LIBEXEC_DIR': '/usr/hdp/2.6.3.0-235/hadoop/libexec'}, 'not_if': 'ambari-sudo.sh  -H -E test -f /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid && ambari-sudo.sh  -H -E pgrep -F /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid'}
2018-06-25 18:45:53,648 - Execute['/usr/bin/kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs-pd_datasec_masking@DATAMASKING.COM'] {'user': 'hdfs'}
2018-06-25 18:45:54,720 - Waiting for this NameNode to leave Safemode due to the following conditions: HA: False, isActive: True, upgradeType: None
2018-06-25 18:45:54,720 - Waiting up to 19 minutes for the NameNode to leave Safemode...
2018-06-25 18:45:54,721 - Execute['/usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin -fs hdfs://ip-172-31-21-5.ec2.internal:8020 -safemode get | grep 'Safe mode is OFF''] {'logoutput': True, 'tries': 115, 'user': 'hdfs', 'try_sleep': 10}
safemode: Call From ip-172-31-21-5.ec2.internal/172.31.21.5 to ip-172-31-21-5.ec2.internal:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
2018-06-25 18:45:57,112 - Retrying after 10 seconds. Reason: Execution of '/usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin -fs hdfs://ip-172-31-21-5.ec2.internal:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. safemode: Call From ip-172-31-21-5.ec2.internal/172.31.21.5 to ip-172-31-21-5.ec2.internal:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
3 REPLIES 3

Re: Unkerberizing - NameNode service doesn't start

Mentor

@Robert Lake

It seems your cluster us still in safe mode, can you do the following

$ hdfs dfsadmin -safemode get

If safemode is ON then run the following

$ hdfs dfsadmin -safemode leave 

Now try restarting the namenode

HTH

Re: Unkerberizing - NameNode service doesn't start

New Contributor

I found out out bungled up the configuration for kerberos with respect to the realm not being the same as the domain. I added the following section to /etc/krb5.conf

[domain_realm]

.ec2.internal = DOMAIN.COM

ec2.internal = DOMAIN.COM

Re: Unkerberizing - NameNode service doesn't start

New Contributor

Now the process of unkerberizing is stopping due to an issue restarting Atlas, the error is:

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/ATLAS/0.1.0.2.3/package/scripts/metadata_server.py", line 175, in <module>
    MetadataServer().execute()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 367, in execute
    method(env)
  File "/var/lib/ambari-agent/cache/common-services/ATLAS/0.1.0.2.3/package/scripts/metadata_server.py", line 66, in start
    self.configure(env)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 120, in locking_configure
    original_configure(obj, *args, **kw)
  File "/var/lib/ambari-agent/cache/common-services/ATLAS/0.1.0.2.3/package/scripts/metadata_server.py", line 53, in configure
    metadata()
  File "/var/lib/ambari-agent/cache/common-services/ATLAS/0.1.0.2.3/package/scripts/metadata.py", line 150, in metadata
    new_service_principals = )
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/solr_cloud_util.py", line 329, in add_solr_roles
    new_service_users.append(__remove_host_from_principal(new_service_user, kerberos_realm))
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/solr_cloud_util.py", line 266, in __remove_host_from_principal
    if not realm:
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/config_dictionary.py", line 73, in __getattr__
    raise Fail("Configuration parameter '" + self.name + "' was not found in configurations dictionary!")
resource_management.core.exceptions.Fail: Configuration parameter 'kerberos-env' was not found in configurations dictionary!