Created 07-21-2017 06:18 AM
Team,
We have enabled Kerberos on our staging cluster. Its noticed that everyday morning when we try to execute hadoop command like hadoop fs -ls / it fails with error. To fix the following we need to restart the namenodes.
user1@datanode1001:~$ hadoop fs -ls /
<Truncated> Found ticket for user1@EXAMPLE.COM to go to krbtgt/EXAMPLE.COM@EXAMPLE.COM expiring on Fri Jul 21 14:43:36 UTC 2017 Entered Krb5Context.initSecContext with state=STATE_NEW Found ticket for user1@EXAMPLE.COM to go to krbtgt/EXAMPLE.COM@EXAMPLE.COMexpiring on Fri Jul 21 14:43:36 UTC 2017 Found ticket for user1@EXAMPLE.COM to go to nn/namenode1001.example.com@EXAMPLE.COM expiring on Fri Jul 21 14:43:36 UTC 2017 Found ticket for user1@EXAMPLE.COM to go to nn/namenode1002.example.com@EXAMPLE.COM expiring on Fri Jul 21 14:43:36 UTC 2017 Found service ticket in the subjectTicket (hex) = 0000: 61 82 01 6A 30 82 01 66 A0 03 02 01 05 A1 0C 1Ba..j0..f........ 0010: 0A 49 4E 4D 4F 42 49 2E 43 4F 4D A2 31 30 2F A0.EXAMPLE.COM.10/. <trucated>Client Principal = user1@EXAMPLE.COM Server Principal = nn/namenode1001.example.com@EXAMPLE.COM Session Key = EncryptionKey: keyType=18 keyBytes (hex dump)= 0000: 23 68 3A EA 16 21 D4 B9 31 0E C6 F8 C6 39 8D 4E#h:..!..1....9.N 0010: 99 68 B8 6A C5 90 E9 E6 3B 17 08 A0 2E C0 AE 48.h.j....;......H <truncated>
Forwardable Ticket true Forwarded Ticket false Proxiable Ticket false Proxy Ticket false Postdated Ticket false Renewable Ticket false Initial Ticket false Auth Time = Fri Jul 21 04:43:36 UTC 2017 Start Time = Fri Jul 21 04:56:34 UTC 2017 End Time = Fri Jul 21 14:43:36 UTC 2017 Renew Till = null Client AddressesNull
>>> KrbApReq: APOptions are 00100000 00000000 00000000 00000000 >>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType Krb5Context setting mySeqNumber to: 160498068 Created InitSecContextToken:
0000: 01 00 6E 82 02 62 30 82 02 5E A0 03 02 01 05 A1..n..b0..^...... 0010: 03 02 01 0E A2 07 03 05 00 20 00 00 00 A3 82 01......... ...... 0020: 6E 61 82 01 6A 30 82 01 66 A0 03 02 01 05 A1 0Cna..j0..f....... <truncated>
ls: Failed on local exception: java.io.IOException: Couldn't setup connection for user1@EXAMPLE.COM to namenode1001.example.com/192.168.1.232:8020; Host Details : local host is: "datanode1001.example.com/192.168.1.182"; destination host is: "namenode1001.example.com":8020;
user1@datanode1001:~$ klist
Ticket cache: FILE:/tmp/krb5cc_1993 Default principal: user1@EXAMPLE.COM Valid starting Expires . Service principal 07/20/2017 07:16:1307/20/2017 17:16:13krbtgt/EXAMPLE.COM@EXAMPLE.COMrenew until 07/27/2017 07:16:08
On namenode1001.example.com I see the following WARNING messages
2017-07-21 06:03:06,446 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 192.168.1.182:45320:null (Failure to initialize security context) 2017-07-21 06:03:06,446 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 192.168.1.182:45320:null (Failure to initialize security context) 2017-07-21 06:03:06,446 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8020: readAndProcess from client 192.168.1.182 threw exception [javax.security.sasl.SaslException: Failure to initialize security context [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos credentails)]]
This issue will be fixed if I restart name node service on namenode1001.example.com. I had to do this every day first thing in the morning to fix the issue. Could this related to tgt renewal.
Created 07-24-2017 02:41 PM
Yes, I am seeing the error now which will be solved if namenode restars.
Created 07-24-2017 03:10 PM
for me it seems your ticket is going expire again in 24 hr."ExpiresService 07/24/2017 18:28:50"
I think you need to change the ticket_lifetime = 7d as well.
Created 07-24-2017 06:14 PM
No the expiry date is "renew until 07/31/2017 08:28:49"
07/24/2017 08:28:50 07/24/2017 18:28:50 krbtgt/EXAMPLE.COM@EXAMPLE.COM renew until 07/31/2017 08:28:49
Created 07-25-2017 04:05 AM
So it evident that
- ticket will get expired and when it expires then service needed to restart. Isn't this a costly operation where you need to restart namenode every week or daily depending on ticket_lifetime.
Is there a way to get this auto renewed or never_expire.
Created 07-26-2017 08:50 PM
Any updates on this issue?
Created 07-28-2017 06:48 AM
issue still persists 🙂 Thanks
Created 07-28-2017 06:47 AM
@Vipin Rathor @Geoffrey Shelton Okot I think by giving too many information I might have caused confusion :). Here is the exact traceback of what I do and where I end up with error.
mazin.mohammed@gateway1005:~$ hadoop fs -ls / Java config name: null Native config name: /etc/krb5.conf Loaded from native config >>>KinitOptions cache name is /tmp/krb5cc_1993 ls: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "gateway1005.example.com/192.168.141.182"; destination host is: "namenode1002.example.com":8020;
The above error clearly mention user mazin.mohammed tgt is expired. I do a kinit to renew my ticket
mazin.mohammed@gateway1005:~$ kinitPassword for mazin.mohammed@EXAMPLE.COM:
KDC - Server Log
Jul 28 05:23:15 kdc1000 krb5kdc[959]: AS_REQ (6 etypes {18 17 16 23 25 26}) 192.168.141.182: NEEDED_PREAUTH: mazin.mohammed@EXAMPLE.COM for krbtgt/EXAMPLE.COM@EXAMPLE.COM, Additional pre-authentication required Jul 28 05:23:34 kdc1000 krb5kdc[959]: AS_REQ (6 etypes {18 17 16 23 25 26}) 192.168.141.182: ISSUE: authtime 123456, etypes {rep=18 tkt=18 ses=18}, mazin.mohammed@EXAMPLE.COM for krbtgt/EXAMPLE.COM@EXAMPLE.COM
mazin.mohammed@gateway1005:~$ klist Ticket cache: FILE:/tmp/krb5cc_1993 Default principal: mazin.mohammed@EXAMPLE.COM Valid starting ExpiresService principal 07/28/2017 05:23:34 07/28/2017 15:23:34 krbtgt/EXAMPLE.COM@EXAMPLE.COM renew until 08/04/2017 05:23:15
Now I try to access the cluster via hadoop command and this is where it ends up in the error even though I have valid TGT present in my cache.
mazin.mohammed@gateway1005:~$ hadoop fs -ls / Found ticket for mazin.mohammed@EXAMPLE.COM to go to krbtgt/EXAMPLE.COM@EXAMPLE.COM expiring on Fri Jul 28 15:23:34 UTC 2017 Found ticket for mazin.mohammed@EXAMPLE.COM to go to nn/namenode1001.example.com@EXAMPLE.COM expiring on Fri Jul 28 15:23:34 UTC 2017 ls: Failed on local exception: java.io.IOException: Couldn't setup connection for mazin.mohammed@EXAMPLE.COM to namenode1001.example.com/192.168.140.232:8020; Host Details : local host is: "gateway.example.com/192.168.141.182"; destination host is: "namenode1001.example.com":8020;
KDC server.log
Jul 28 05:24:15 infosec1000 krb5kdc[959]: TGS_REQ (4 etypes {18 17 16 23}) 192.168.141.182: ISSUE: authtime 1501219414, etypes {rep=18 tkt=18 ses=18}, mazin.mohammed@EXAMPLE.COM for nn/namenode1001.example.com@EXAMPLE.COM Jul 28 05:24:18 infosec1000 krb5kdc[959]: TGS_REQ (4 etypes {18 17 16 23}) 192.168.141.182: ISSUE: authtime 1501219414, etypes {rep=18 tkt=18 ses=18}, mazin.mohammed@EXAMPLE.COM for nn/namenode1001.example.com@EXAMPLE.COM
Namenode log
2017-07-28 06:20:52,151 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 192.168.141.182:58192:null (Failure to initialize security context) 2017-07-28 06:20:52,151 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 192.168.141.182:58192:null (Failure to initialize security context) 2017-07-28 06:20:52,151 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8020: readAndProcess from client 192.168.141.182 threw exception [javax.security.sasl.SaslException: Failure to initialize security context [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos credentails)]]
In this case I am the client and I do have a TGT and service ticket issued by TGS to access hadoop cluster but namenode is refusing by throwing an exception threw exception [javax.security.sasl.SaslException: Failure to initialize security context [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos credentails)]].
As said in the thread this issue is solved if I restart the namenode. I hope this will give some more insights to the issue we are facing.
Created 08-14-2017 10:29 PM
Hello @Mazin Mohammed,
The "Failed to find any Kerberos credentials" error in Namenode log, could also mean wrong / bad Kerberos credential for Namenode process. It does not necessarily always mean client credential.
Please check during NameNode startup that if the daemon was able to use nn.service.keytab and spnego.service.keytab both to secure a kerberos credential for itself. You also might want to enable Kerberos debug for NameNode by adding "-Dsun.security.krb5.debug=true" to Namenode command-line argument list.
Let us know. Happy hunting !