Created 12-28-2017 01:32 PM
I have distributed setup of Hadoop cluster with 2 NN and 3 DN. I have enabled Kerberos on the cluster as per the steps mentioned in security document using Ambari wizard. On last step of Wizard, Ambari trying to start the services but Name node services are not getting started. In the namenode log file I can see below error:
2017-12-28 07:24:11,727 ERROR namenode.EditLogInputStream (EditLogFileInputStream.java:nextOpImpl(194)) - caught exception initializing http://ip-***-***-**-**.us-east-1.ec2.aws.net:8480/getJournal?jid=krbhdfs&segmentTxId=2979&storageIn... java.io.IOException: org.apache.hadoop.security.authentication.client.AuthenticationException: Authentication failed, URL: http://ip-***-***-**-**.us-east-1.ec2.aws.net:8480/getJournal?jid=krbhdfs&segmentTxId=2979&storageIn..., status: 403, message: GSSException: Failure unspecified at GSS-API level (Mechanism level: Invalid argument (400) - Cannot find key of appropriate type to decrypt AP REP - AES256 CTS mode with HMAC SHA1-96)
The keytab file details are as below
#klist -kte nn.service.keytab Keytab name: FILE:nn.service.keytab KVNO Timestamp Principal ---- ------------------- ------------------------------------------------------ 1 12/28/2017 07:02:18 nn/ip-***-***-***-**.us-east-1.ec2.aws.net@KERBTEST.COM (aes256-cts-hmac-sha1-96) 1 12/28/2017 07:02:18 nn/ip-***-***-***-**.us-east-1.ec2.aws.net@KERBTEST.COM (aes128-cts-hmac-sha1-96) 1 12/28/2017 07:02:18 nn/ip-***-***-***-**.us-east-1.ec2.aws.net@KERBTEST.COM (arcfour-hmac) 1 12/28/2017 07:02:18 nn/ip-***-***-***-**.us-east-1.ec2.aws.net@KERBTEST.COM (des3-cbc-sha1) 1 12/28/2017 07:02:18 nn/ip-***-***-***-**.us-east-1.ec2.aws.net@KERBTEST.COM (des-cbc-md5) # klist -kte spnego.service.keytab Keytab name: FILE:spnego.service.keytab KVNO Timestamp Principal ---- ------------------- ------------------------------------------------------ 1 12/28/2017 07:02:17 HTTP/ip-***-***-***-**.us-east-1.ec2.aws.net@KERBTEST.COM (aes256-cts-hmac-sha1-96) 1 12/28/2017 07:02:17 HTTP/ip-***-***-***-**.us-east-1.ec2.aws.net@KERBTEST.COM (aes128-cts-hmac-sha1-96) 1 12/28/2017 07:02:17 HTTP/ip-***-***-***-**.us-east-1.ec2.aws.net@KERBTEST.COM (arcfour-hmac) 1 12/28/2017 07:02:17 HTTP/ip-***-***-***-**.us-east-1.ec2.aws.net@KERBTEST.COM (des3-cbc-sha1) 1 12/28/2017 07:02:17 HTTP/ip-***-***-***-**.us-east-1.ec2.aws.net@KERBTEST.COM (des-cbc-md5)
HDP version : HDP-2.5.0.55
Created 01-11-2018 09:54 AM
Thanks @Robert Levas. I later figured out that "spnego.service.keytab" requires 444 access on all Journal node. Once I changed the mode to 444, an restarting the journalnode, NameNode started working.
Created 12-28-2017 04:19 PM
One of a few issues may be in play. First, make sure that the unlimited key JCE policy is installed. Then make sure that the krb5.conf file or the KRB5CCNAME environment variable is not forcing the ticket cache to be stored in a KEYRING facility - the ticket cache needs to be stored in a file. Finally, ensure DNS and reverse DNS name resolution is configured properly.
Let me know if you need detailed explanations on any of those.
Created 12-29-2017 09:25 AM
Thanks for the reply.
JCE policy is installed correctly
$ java test_jce 2147483647
Ticket cache is stored in file.
[libdefaults] renew_lifetime = 7d forwardable = true default_realm = KERBTEST.COM ticket_lifetime = 24h dns_lookup_realm = false dns_lookup_kdc = false default_ccache_name = /tmp/krb5cc_%{uid}
DNS entries are also fine. Looked at journal node logs. The above error is thrown by journal node.
Created 12-29-2017 01:09 PM
Issue got resolved. I copied the namenode keytab file on journal node and restarted the JournalNode. After this started namenode. Looks like JournalNode was not able to decrypt data from namenode. @Robert Levas is this correct resolution?
Created 01-01-2018 02:50 PM
You should not have had to manually copy anything, so I am confused as to what the issue was.
Created 01-02-2018 03:29 AM
JournalNode throws below error when NameNode is trying to read the Journal during startup.
status: 403, message: GSSException: Failure unspecified at GSS-API level (Mechanism level: Invalid argument (400) - Cannot find key of appropriate type to decrypt AP REP - AES256 CTS mode with HMAC SHA1-96)
So I copied the NameNode keytab on journalNode and the error got resolved.
Created 01-09-2018 07:06 PM
@Robert Levas can you confirm if the solution is valid, as I am facing same issue on other cluster as well, and not sure of workaround.
Created 01-09-2018 07:24 PM
Your solution of copying keytab files around is not valid. There must be some cause for the missing keytab file.
Created 01-11-2018 09:54 AM
Thanks @Robert Levas. I later figured out that "spnego.service.keytab" requires 444 access on all Journal node. Once I changed the mode to 444, an restarting the journalnode, NameNode started working.