Support Questions

Find answers, ask questions, and share your expertise

kerberos fails time to time

avatar
Explorer

Hi, all

I have a Hadoop cluster with kerberos enabled, and I add many hadoop clients to access this hadoop cluster. I run a crontab to refresh the kerberos ticket file and everything runs ok for about one month:

0 0 * * * /usr/bin/kinit -k -t /etc/security/keytabs/test.app.keytab test/myhost.dcs.com

But just now, I cannot run hadoop for such error:

Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos txt)]

But I run klist and there exists auth information:

[test@myhost ~]$ klist

Ticket cache: FILE:/tmp/krb5cc_613

Default principal: test/myhost.dcs.com@DCS.COM

Valid starting Expires Service principal

09/23/16 07:10:35 09/25/16 07:12:35 krbtgt/DCS.COM@DCS.COM

renew until 09/30/16 00:00:01

[control@myhost ~]$

I check the ticket file in /tmp:

[test@myhost ~]$ strings /tmp/krb5cc_613

DCS.COM

control

myhost.dcs.com

DCS.COM

control

myhost.dcs.com

DCS.COM

krbtgt

DCS.COM

DCS.COM

krbtgt

DCS.COM

a5EK

[>$$

There is no auth information in /tmp/krb5cc_613.

The KDC server log is following:

Sep 23 00:00:01 nnserver krb5kdc[39261](info): AS_REQ (7 etypes {16 23 1 3 18 17 2}) 192.168.1.107: ISSUE: authtime 1474560001, etypes {rep=16 tkt=18 ses=16}, test/myhost.dcs.com@DCS.COM for krbtgt/DCS.COM@DCS.COM

Sep 23 05:39:52 nnserver krb5kdc[39261](info): TGS_REQ (5 etypes {17 16 23 1 3}) 192.168.1.107: ISSUE: authtime 1474560001, etypes {rep=16 tkt=16 ses=16}, test/myhost.dcs.com@DCS.COM for nn/jfbh2n03.dcs.com@DCS.COM

Sep 23 07:10:35 nnserver krb5kdc[39261](info): TGS_REQ (7 etypes {18 17 16 23 1 3 2}) 192.168.1.107: ISSUE: authtime 1474560001, etypes {rep=16 tkt=18 ses=18}, test/myhost.dcs.com@DCS.COM for krbtgt/DCS.COM@DCS.COM

Sep 23 10:57:31 nnserver krb5kdc[39261](info): TGS_REQ (5 etypes {17 16 23 1 3}) 192.168.1.107: PROCESS_TGS: authtime 0, <unknown client> for <unknown server>, Ticket expired

My question is : Can other application(such as : hadoop client) edit /tmp/krb5cc_613 programmly? I think other application (hadoop client) just read information from /tmp/krb5cc_613 instead writing it.

1 REPLY 1

avatar
Guru

Hi @xu jerry

Few observations:

1. The crontab is set to get a new ticket at midnight every day. But the klist output says that the ticket was acquired on "09/23/16 07:10:35". Meaning, someone (or some program) had refreshed the ticket after midnight at 7:10.

2. By default, the TGT would be valid for a day. But in your case, the validity looks to be '2days and 2 minutes' (from klist output). Is that expected?

3. The KDC logs clearly says that the ticket was expired by "Sep 23 10:57:31". Also you can see that there was a TGT request (AS_REQ) at midnight (that'd be your crontab). And there were two service ticket requests (TGS_REQUEST). So as per KDC log, no one refreshed the TGT after midnight. (so my #1 stand false as of this)

To answer your question:

My question is : Can other application(such as : hadoop client) edit 
/tmp/krb5cc_613 programmly? I think other application (hadoop client) 
just read information from /tmp/krb5cc_613 instead writing it.

Usually the hadoop clients and applications would only consume (i.e. read) the TGT. The only condition in which a TGT would get updated is when an application try to do kinit programmatically.

If you are consistently getting this error, then I'd advice to run kinit in the debug mode. That is once you get ticket expired error, then execute these and check (& post) the output here.

export KRB5_TRACE=/dev/stdout
klist -eaf
kvno <name_of_any_service_principal>

Also, it'd also make sense to attach your /etc/krb5.conf to know what are the current Kerberos configurations.

Hope this help.