Created on 09-23-2016 03:50 AM - edited 09-16-2022 03:40 AM
Hi, all
I have a Hadoop cluster with kerberos enabled, and I add many hadoop clients to access this hadoop cluster. I run a crontab to refresh the kerberos ticket file and everything runs ok for about one month:
0 0 * * * /usr/bin/kinit -k -t /etc/security/keytabs/test.app.keytab test/myhost.dcs.com
But just now, I cannot run hadoop for such error:
Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos txt)]
But I run klist and there exists auth information:
[test@myhost ~]$ klist
Ticket cache: FILE:/tmp/krb5cc_613
Default principal: test/myhost.dcs.com@DCS.COM
Valid starting Expires Service principal
09/23/16 07:10:35 09/25/16 07:12:35 krbtgt/DCS.COM@DCS.COM
renew until 09/30/16 00:00:01
[control@myhost ~]$
I check the ticket file in /tmp:
[test@myhost ~]$ strings /tmp/krb5cc_613
DCS.COM
control
myhost.dcs.com
DCS.COM
control
myhost.dcs.com
DCS.COM
krbtgt
DCS.COM
DCS.COM
krbtgt
DCS.COM
a5EK
[>$$
There is no auth information in /tmp/krb5cc_613.
The KDC server log is following:
Sep 23 00:00:01 nnserver krb5kdc[39261](info): AS_REQ (7 etypes {16 23 1 3 18 17 2}) 192.168.1.107: ISSUE: authtime 1474560001, etypes {rep=16 tkt=18 ses=16}, test/myhost.dcs.com@DCS.COM for krbtgt/DCS.COM@DCS.COM
Sep 23 05:39:52 nnserver krb5kdc[39261](info): TGS_REQ (5 etypes {17 16 23 1 3}) 192.168.1.107: ISSUE: authtime 1474560001, etypes {rep=16 tkt=16 ses=16}, test/myhost.dcs.com@DCS.COM for nn/jfbh2n03.dcs.com@DCS.COM
Sep 23 07:10:35 nnserver krb5kdc[39261](info): TGS_REQ (7 etypes {18 17 16 23 1 3 2}) 192.168.1.107: ISSUE: authtime 1474560001, etypes {rep=16 tkt=18 ses=18}, test/myhost.dcs.com@DCS.COM for krbtgt/DCS.COM@DCS.COM
Sep 23 10:57:31 nnserver krb5kdc[39261](info): TGS_REQ (5 etypes {17 16 23 1 3}) 192.168.1.107: PROCESS_TGS: authtime 0, <unknown client> for <unknown server>, Ticket expired
My question is : Can other application(such as : hadoop client) edit /tmp/krb5cc_613 programmly? I think other application (hadoop client) just read information from /tmp/krb5cc_613 instead writing it.
Created 09-23-2016 10:11 AM
Hi @xu jerry
Few observations:
1. The crontab is set to get a new ticket at midnight every day. But the klist output says that the ticket was acquired on "09/23/16 07:10:35". Meaning, someone (or some program) had refreshed the ticket after midnight at 7:10.
2. By default, the TGT would be valid for a day. But in your case, the validity looks to be '2days and 2 minutes' (from klist output). Is that expected?
3. The KDC logs clearly says that the ticket was expired by "Sep 23 10:57:31". Also you can see that there was a TGT request (AS_REQ) at midnight (that'd be your crontab). And there were two service ticket requests (TGS_REQUEST). So as per KDC log, no one refreshed the TGT after midnight. (so my #1 stand false as of this)
To answer your question:
My question is : Can other application(such as : hadoop client) edit /tmp/krb5cc_613 programmly? I think other application (hadoop client) just read information from /tmp/krb5cc_613 instead writing it.
Usually the hadoop clients and applications would only consume (i.e. read) the TGT. The only condition in which a TGT would get updated is when an application try to do kinit programmatically.
If you are consistently getting this error, then I'd advice to run kinit in the debug mode. That is once you get ticket expired error, then execute these and check (& post) the output here.
export KRB5_TRACE=/dev/stdout klist -eaf kvno <name_of_any_service_principal>
Also, it'd also make sense to attach your /etc/krb5.conf to know what are the current Kerberos configurations.
Hope this help.