Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Unable to start namenode after enabling Kerberos

avatar

I have distributed setup of Hadoop cluster with 2 NN and 3 DN. I have enabled Kerberos on the cluster as per the steps mentioned in security document using Ambari wizard. On last step of Wizard, Ambari trying to start the services but Name node services are not getting started. In the namenode log file I can see below error:

2017-12-28 07:24:11,727 ERROR namenode.EditLogInputStream (EditLogFileInputStream.java:nextOpImpl(194)) - caught exception initializing http://ip-***-***-**-**.us-east-1.ec2.aws.net:8480/getJournal?jid=krbhdfs&segmentTxId=2979&storageIn...
java.io.IOException: org.apache.hadoop.security.authentication.client.AuthenticationException: Authentication failed, URL: http://ip-***-***-**-**.us-east-1.ec2.aws.net:8480/getJournal?jid=krbhdfs&segmentTxId=2979&storageIn..., status: 403, message: GSSException: Failure unspecified at GSS-API level (Mechanism level: Invalid argument (400) - Cannot find key of appropriate type to decrypt AP REP - AES256 CTS mode with HMAC SHA1-96)

The keytab file details are as below

#klist -kte nn.service.keytab 
Keytab name: FILE:nn.service.keytab
KVNO Timestamp           Principal
---- ------------------- ------------------------------------------------------
   1 12/28/2017 07:02:18 nn/ip-***-***-***-**.us-east-1.ec2.aws.net@KERBTEST.COM (aes256-cts-hmac-sha1-96) 
   1 12/28/2017 07:02:18 nn/ip-***-***-***-**.us-east-1.ec2.aws.net@KERBTEST.COM (aes128-cts-hmac-sha1-96) 
   1 12/28/2017 07:02:18 nn/ip-***-***-***-**.us-east-1.ec2.aws.net@KERBTEST.COM (arcfour-hmac) 
   1 12/28/2017 07:02:18 nn/ip-***-***-***-**.us-east-1.ec2.aws.net@KERBTEST.COM (des3-cbc-sha1) 
   1 12/28/2017 07:02:18 nn/ip-***-***-***-**.us-east-1.ec2.aws.net@KERBTEST.COM (des-cbc-md5) 
# klist -kte spnego.service.keytab 
Keytab name: FILE:spnego.service.keytab
KVNO Timestamp           Principal
---- ------------------- ------------------------------------------------------
   1 12/28/2017 07:02:17 HTTP/ip-***-***-***-**.us-east-1.ec2.aws.net@KERBTEST.COM (aes256-cts-hmac-sha1-96) 
   1 12/28/2017 07:02:17 HTTP/ip-***-***-***-**.us-east-1.ec2.aws.net@KERBTEST.COM (aes128-cts-hmac-sha1-96) 
   1 12/28/2017 07:02:17 HTTP/ip-***-***-***-**.us-east-1.ec2.aws.net@KERBTEST.COM (arcfour-hmac) 
   1 12/28/2017 07:02:17 HTTP/ip-***-***-***-**.us-east-1.ec2.aws.net@KERBTEST.COM (des3-cbc-sha1) 
   1 12/28/2017 07:02:17 HTTP/ip-***-***-***-**.us-east-1.ec2.aws.net@KERBTEST.COM (des-cbc-md5) 

HDP version : HDP-2.5.0.55

1 ACCEPTED SOLUTION

avatar

Thanks @Robert Levas. I later figured out that "spnego.service.keytab" requires 444 access on all Journal node. Once I changed the mode to 444, an restarting the journalnode, NameNode started working.

View solution in original post

8 REPLIES 8

avatar

One of a few issues may be in play. First, make sure that the unlimited key JCE policy is installed. Then make sure that the krb5.conf file or the KRB5CCNAME environment variable is not forcing the ticket cache to be stored in a KEYRING facility - the ticket cache needs to be stored in a file. Finally, ensure DNS and reverse DNS name resolution is configured properly.

Let me know if you need detailed explanations on any of those.

avatar
@Robert Levas

Thanks for the reply.

JCE policy is installed correctly

 $ java test_jce
 2147483647

Ticket cache is stored in file.

[libdefaults]
renew_lifetime = 7d
forwardable = true
default_realm = KERBTEST.COM
ticket_lifetime = 24h
dns_lookup_realm = false
dns_lookup_kdc = false
default_ccache_name = /tmp/krb5cc_%{uid}

DNS entries are also fine. Looked at journal node logs. The above error is thrown by journal node.

avatar

Issue got resolved. I copied the namenode keytab file on journal node and restarted the JournalNode. After this started namenode. Looks like JournalNode was not able to decrypt data from namenode. @Robert Levas is this correct resolution?

avatar

You should not have had to manually copy anything, so I am confused as to what the issue was.

avatar

JournalNode throws below error when NameNode is trying to read the Journal during startup.

status: 403, message: GSSException: Failure unspecified at GSS-API level (Mechanism level: Invalid argument (400) - Cannot find key of appropriate type to decrypt AP REP - AES256 CTS mode with HMAC SHA1-96)

So I copied the NameNode keytab on journalNode and the error got resolved.

avatar

@Robert Levas can you confirm if the solution is valid, as I am facing same issue on other cluster as well, and not sure of workaround.

avatar

Your solution of copying keytab files around is not valid. There must be some cause for the missing keytab file.

avatar

Thanks @Robert Levas. I later figured out that "spnego.service.keytab" requires 444 access on all Journal node. Once I changed the mode to 444, an restarting the journalnode, NameNode started working.