Created on 12-09-2016 09:40 AM - edited 09-16-2022 03:50 AM
I'm having issues with Kerberos tickets for Hadoop services not being renewed before they expire. E.g the ticket for Oozie, it is valid for 10h and then it takes a certain number of hours until the ticket is renewed(or recreated) again. I'm expecting there to always be a valid ticket present for the services.
The effect of this is e.g that I can't list directories in HDFS as the Oozie user(in the shell), it fails with the following error message:
GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
I can renew the ticket again manually using the keytab which makes the HDFS listing work but i feel that it shouldn't be necessary.
Strangely enough there are never any service related errors in Ambari.
Any ideas on how to resolve this?
Created 12-09-2016 10:44 PM
Whenever you start any service in kerberos enabled cluster, let's say Namenode service, first Ambari initiates the kerberos ticket and once service is started, it has logic to re-login and get the fresh ticket.
Comment from @Chris Nauroth from stackoverflow question on how Hadoop implements an automatic re-login mechanism directly inside the RPC client layer ( please read his awesome answer on http://stackoverflow.com/questions/34616676/should-i-call-ugi-checktgtandreloginfromkeytab-before-ev... when you get a chance )
####
The code for this is visible in the RPC Client#handleSaslConnectionFailure method:
// try re-login if (UserGroupInformation.isLoginKeytabBased()) { UserGroupInformation.getLoginUser().reloginFromKeytab(); } else if (UserGroupInformation.isLoginTicketBased()) { UserGroupInformation.getLoginUser().reloginFromTicketCache(); }
This explains answer to your question - "Strangely enough there are never any service related errors in Ambari."
##
The effect of this is e.g that I can't list directories in HDFS as the Oozie user(in the shell), it fails with the following error message: --> This is expected as ticket gets expired after 24 hours.
Hope this answers your question! 🙂
Created 12-09-2016 10:44 PM
Whenever you start any service in kerberos enabled cluster, let's say Namenode service, first Ambari initiates the kerberos ticket and once service is started, it has logic to re-login and get the fresh ticket.
Comment from @Chris Nauroth from stackoverflow question on how Hadoop implements an automatic re-login mechanism directly inside the RPC client layer ( please read his awesome answer on http://stackoverflow.com/questions/34616676/should-i-call-ugi-checktgtandreloginfromkeytab-before-ev... when you get a chance )
####
The code for this is visible in the RPC Client#handleSaslConnectionFailure method:
// try re-login if (UserGroupInformation.isLoginKeytabBased()) { UserGroupInformation.getLoginUser().reloginFromKeytab(); } else if (UserGroupInformation.isLoginTicketBased()) { UserGroupInformation.getLoginUser().reloginFromTicketCache(); }
This explains answer to your question - "Strangely enough there are never any service related errors in Ambari."
##
The effect of this is e.g that I can't list directories in HDFS as the Oozie user(in the shell), it fails with the following error message: --> This is expected as ticket gets expired after 24 hours.
Hope this answers your question! 🙂
Created 12-11-2016 10:45 AM
Thanks @Kuldeep Kulkarni
This has cleared some doubts about how Hadoop services auto-login even though initial user TGT shows expired.