Support Questions

Find answers, ask questions, and share your expertise

Namenode needs restart everyday after setting up kerberos

avatar

Team,

We have enabled Kerberos on our staging cluster. Its noticed that everyday morning when we try to execute hadoop command like hadoop fs -ls / it fails with error. To fix the following we need to restart the namenodes.

user1@datanode1001:~$ hadoop fs -ls /

<Truncated>
Found ticket for user1@EXAMPLE.COM to go to krbtgt/EXAMPLE.COM@EXAMPLE.COM expiring on Fri Jul 21 14:43:36 UTC 2017
Entered Krb5Context.initSecContext with state=STATE_NEW
Found ticket for user1@EXAMPLE.COM to go to krbtgt/EXAMPLE.COM@EXAMPLE.COMexpiring on Fri Jul 21 14:43:36 UTC 2017
Found ticket for user1@EXAMPLE.COM to go to nn/namenode1001.example.com@EXAMPLE.COM expiring on Fri Jul 21 14:43:36 UTC 2017
Found ticket for user1@EXAMPLE.COM to go to nn/namenode1002.example.com@EXAMPLE.COM expiring on Fri Jul 21 14:43:36 UTC 2017
Found service ticket in the subjectTicket (hex) =
0000: 61 82 01 6A 30 82 01 66 A0 03 02 01 05 A1 0C 1Ba..j0..f........
0010: 0A 49 4E 4D 4F 42 49 2E 43 4F 4D A2 31 30 2F A0.EXAMPLE.COM.10/.
<trucated>Client Principal = user1@EXAMPLE.COM
Server Principal = nn/namenode1001.example.com@EXAMPLE.COM
Session Key = EncryptionKey: keyType=18 keyBytes (hex dump)=
0000: 23 68 3A EA 16 21 D4 B9 31 0E C6 F8 C6 39 8D 4E#h:..!..1....9.N
0010: 99 68 B8 6A C5 90 E9 E6 3B 17 08 A0 2E C0 AE 48.h.j....;......H <truncated>
Forwardable Ticket true
Forwarded Ticket false
Proxiable Ticket false
Proxy Ticket false
Postdated Ticket false
Renewable Ticket false
Initial Ticket false
Auth Time = Fri Jul 21 04:43:36 UTC 2017
Start Time = Fri Jul 21 04:56:34 UTC 2017
End Time = Fri Jul 21 14:43:36 UTC 2017
Renew Till = null
Client AddressesNull
>>> KrbApReq: APOptions are 00100000 00000000 00000000 00000000
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
Krb5Context setting mySeqNumber to: 160498068
Created InitSecContextToken:
0000: 01 00 6E 82 02 62 30 82 02 5E A0 03 02 01 05 A1..n..b0..^......
0010: 03 02 01 0E A2 07 03 05 00 20 00 00 00 A3 82 01......... ......
0020: 6E 61 82 01 6A 30 82 01 66 A0 03 02 01 05 A1 0Cna..j0..f.......
<truncated>
ls: Failed on local exception: java.io.IOException: Couldn't setup connection for user1@EXAMPLE.COM to namenode1001.example.com/192.168.1.232:8020; Host Details : local host is: "datanode1001.example.com/192.168.1.182"; destination host is: "namenode1001.example.com":8020;
user1@datanode1001:~$ klist 
Ticket cache: FILE:/tmp/krb5cc_1993
Default principal: user1@EXAMPLE.COM
Valid starting    Expires . Service principal
07/20/2017 07:16:1307/20/2017 17:16:13krbtgt/EXAMPLE.COM@EXAMPLE.COMrenew until 07/27/2017 07:16:08

On namenode1001.example.com I see the following WARNING messages

2017-07-21 06:03:06,446 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 192.168.1.182:45320:null (Failure to initialize security context)
2017-07-21 06:03:06,446 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 192.168.1.182:45320:null (Failure to initialize security context)
2017-07-21 06:03:06,446 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8020: readAndProcess from client 192.168.1.182 threw exception [javax.security.sasl.SaslException: Failure to initialize security context [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos credentails)]]

This issue will be fixed if I restart name node service on namenode1001.example.com. I had to do this every day first thing in the morning to fix the issue. Could this related to tgt renewal.

17 REPLIES 17

avatar

Yes, I am seeing the error now which will be solved if namenode restars.

avatar
@Mazin Mohamme

for me it seems your ticket is going expire again in 24 hr."ExpiresService 07/24/2017 18:28:50"

I think you need to change the ticket_lifetime = 7d as well.

avatar
Master Mentor

@Divakar Annapureddy

No the expiry date is "renew until 07/31/2017 08:28:49"

07/24/2017 08:28:50  07/24/2017 18:28:50  krbtgt/EXAMPLE.COM@EXAMPLE.COM

renew until 07/31/2017 08:28:49

avatar

So it evident that

- ticket will get expired and when it expires then service needed to restart. Isn't this a costly operation where you need to restart namenode every week or daily depending on ticket_lifetime.

Is there a way to get this auto renewed or never_expire.

avatar
Master Mentor

@@Mazin Mohammed

Any updates on this issue?

avatar

issue still persists 🙂 Thanks

avatar

@Vipin Rathor @Geoffrey Shelton Okot I think by giving too many information I might have caused confusion :). Here is the exact traceback of what I do and where I end up with error.

mazin.mohammed@gateway1005:~$ hadoop fs -ls /
Java config name: null
Native config name: /etc/krb5.conf
Loaded from native config
>>>KinitOptions cache name is /tmp/krb5cc_1993
ls: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "gateway1005.example.com/192.168.141.182"; destination host is: "namenode1002.example.com":8020;

The above error clearly mention user mazin.mohammed tgt is expired. I do a kinit to renew my ticket

mazin.mohammed@gateway1005:~$ kinitPassword for mazin.mohammed@EXAMPLE.COM:

KDC - Server Log

Jul 28 05:23:15 kdc1000 krb5kdc[959]: AS_REQ (6 etypes {18 17 16 23 25 26}) 192.168.141.182: NEEDED_PREAUTH: mazin.mohammed@EXAMPLE.COM for krbtgt/EXAMPLE.COM@EXAMPLE.COM, Additional pre-authentication required
Jul 28 05:23:34 kdc1000 krb5kdc[959]: AS_REQ (6 etypes {18 17 16 23 25 26}) 192.168.141.182: ISSUE: authtime 123456, etypes {rep=18 tkt=18 ses=18}, mazin.mohammed@EXAMPLE.COM for krbtgt/EXAMPLE.COM@EXAMPLE.COM
mazin.mohammed@gateway1005:~$ klist 
Ticket cache: FILE:/tmp/krb5cc_1993
Default principal: mazin.mohammed@EXAMPLE.COM
Valid starting       ExpiresService       principal
07/28/2017 05:23:34  07/28/2017 15:23:34  krbtgt/EXAMPLE.COM@EXAMPLE.COM
renew until 08/04/2017 05:23:15

Now I try to access the cluster via hadoop command and this is where it ends up in the error even though I have valid TGT present in my cache.

mazin.mohammed@gateway1005:~$ hadoop fs -ls /
Found ticket for mazin.mohammed@EXAMPLE.COM to go to krbtgt/EXAMPLE.COM@EXAMPLE.COM expiring on Fri Jul 28 15:23:34 UTC 2017
Found ticket for mazin.mohammed@EXAMPLE.COM to go to nn/namenode1001.example.com@EXAMPLE.COM expiring on Fri Jul 28 15:23:34 UTC 2017

ls: Failed on local exception: java.io.IOException: Couldn't setup connection for mazin.mohammed@EXAMPLE.COM to namenode1001.example.com/192.168.140.232:8020; Host Details : local host is: "gateway.example.com/192.168.141.182"; destination host is: "namenode1001.example.com":8020;

KDC server.log

Jul 28 05:24:15 infosec1000 krb5kdc[959]: TGS_REQ (4 etypes {18 17 16 23}) 192.168.141.182: ISSUE: authtime 1501219414, etypes {rep=18 tkt=18 ses=18}, mazin.mohammed@EXAMPLE.COM for nn/namenode1001.example.com@EXAMPLE.COM
Jul 28 05:24:18 infosec1000 krb5kdc[959]: TGS_REQ (4 etypes {18 17 16 23}) 192.168.141.182: ISSUE: authtime 1501219414, etypes {rep=18 tkt=18 ses=18}, mazin.mohammed@EXAMPLE.COM for nn/namenode1001.example.com@EXAMPLE.COM

Namenode log

2017-07-28 06:20:52,151 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 192.168.141.182:58192:null (Failure to initialize security context)
2017-07-28 06:20:52,151 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 192.168.141.182:58192:null (Failure to initialize security context)
2017-07-28 06:20:52,151 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8020: readAndProcess from client 192.168.141.182 threw exception [javax.security.sasl.SaslException: Failure to initialize security context [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos credentails)]]

In this case I am the client and I do have a TGT and service ticket issued by TGS to access hadoop cluster but namenode is refusing by throwing an exception threw exception [javax.security.sasl.SaslException: Failure to initialize security context [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos credentails)]].

As said in the thread this issue is solved if I restart the namenode. I hope this will give some more insights to the issue we are facing.

avatar
Guru

Hello @Mazin Mohammed,

The "Failed to find any Kerberos credentials" error in Namenode log, could also mean wrong / bad Kerberos credential for Namenode process. It does not necessarily always mean client credential.

Please check during NameNode startup that if the daemon was able to use nn.service.keytab and spnego.service.keytab both to secure a kerberos credential for itself. You also might want to enable Kerberos debug for NameNode by adding "-Dsun.security.krb5.debug=true" to Namenode command-line argument list.

Let us know. Happy hunting !