Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Namenode needs restart everyday after setting up kerberos

avatar

Team,

We have enabled Kerberos on our staging cluster. Its noticed that everyday morning when we try to execute hadoop command like hadoop fs -ls / it fails with error. To fix the following we need to restart the namenodes.

user1@datanode1001:~$ hadoop fs -ls /

<Truncated>
Found ticket for user1@EXAMPLE.COM to go to krbtgt/EXAMPLE.COM@EXAMPLE.COM expiring on Fri Jul 21 14:43:36 UTC 2017
Entered Krb5Context.initSecContext with state=STATE_NEW
Found ticket for user1@EXAMPLE.COM to go to krbtgt/EXAMPLE.COM@EXAMPLE.COMexpiring on Fri Jul 21 14:43:36 UTC 2017
Found ticket for user1@EXAMPLE.COM to go to nn/namenode1001.example.com@EXAMPLE.COM expiring on Fri Jul 21 14:43:36 UTC 2017
Found ticket for user1@EXAMPLE.COM to go to nn/namenode1002.example.com@EXAMPLE.COM expiring on Fri Jul 21 14:43:36 UTC 2017
Found service ticket in the subjectTicket (hex) =
0000: 61 82 01 6A 30 82 01 66 A0 03 02 01 05 A1 0C 1Ba..j0..f........
0010: 0A 49 4E 4D 4F 42 49 2E 43 4F 4D A2 31 30 2F A0.EXAMPLE.COM.10/.
<trucated>Client Principal = user1@EXAMPLE.COM
Server Principal = nn/namenode1001.example.com@EXAMPLE.COM
Session Key = EncryptionKey: keyType=18 keyBytes (hex dump)=
0000: 23 68 3A EA 16 21 D4 B9 31 0E C6 F8 C6 39 8D 4E#h:..!..1....9.N
0010: 99 68 B8 6A C5 90 E9 E6 3B 17 08 A0 2E C0 AE 48.h.j....;......H <truncated>
Forwardable Ticket true
Forwarded Ticket false
Proxiable Ticket false
Proxy Ticket false
Postdated Ticket false
Renewable Ticket false
Initial Ticket false
Auth Time = Fri Jul 21 04:43:36 UTC 2017
Start Time = Fri Jul 21 04:56:34 UTC 2017
End Time = Fri Jul 21 14:43:36 UTC 2017
Renew Till = null
Client AddressesNull
>>> KrbApReq: APOptions are 00100000 00000000 00000000 00000000
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
Krb5Context setting mySeqNumber to: 160498068
Created InitSecContextToken:
0000: 01 00 6E 82 02 62 30 82 02 5E A0 03 02 01 05 A1..n..b0..^......
0010: 03 02 01 0E A2 07 03 05 00 20 00 00 00 A3 82 01......... ......
0020: 6E 61 82 01 6A 30 82 01 66 A0 03 02 01 05 A1 0Cna..j0..f.......
<truncated>
ls: Failed on local exception: java.io.IOException: Couldn't setup connection for user1@EXAMPLE.COM to namenode1001.example.com/192.168.1.232:8020; Host Details : local host is: "datanode1001.example.com/192.168.1.182"; destination host is: "namenode1001.example.com":8020;
user1@datanode1001:~$ klist 
Ticket cache: FILE:/tmp/krb5cc_1993
Default principal: user1@EXAMPLE.COM
Valid starting    Expires . Service principal
07/20/2017 07:16:1307/20/2017 17:16:13krbtgt/EXAMPLE.COM@EXAMPLE.COMrenew until 07/27/2017 07:16:08

On namenode1001.example.com I see the following WARNING messages

2017-07-21 06:03:06,446 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 192.168.1.182:45320:null (Failure to initialize security context)
2017-07-21 06:03:06,446 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 192.168.1.182:45320:null (Failure to initialize security context)
2017-07-21 06:03:06,446 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8020: readAndProcess from client 192.168.1.182 threw exception [javax.security.sasl.SaslException: Failure to initialize security context [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos credentails)]]

This issue will be fixed if I restart name node service on namenode1001.example.com. I had to do this every day first thing in the morning to fix the issue. Could this related to tgt renewal.

17 REPLIES 17

avatar
Guru

Hello @Mazin Mohammed,

Yes, this looks like to be related to TGT renewal for NameNode credential. That's why it starts working when you restart (forces to get new NN credential). The quickest way to check would be to get a ticket for nn/<host> principal and do klist like this:

# kinit -kt <nn.service.keytab> nn/<host>
# klist -eaf

The output above should give us some pointers. Please post the output here for all of us to see.

Hope this helps!

avatar

Thanks for reply,

On datanode1001 I dont have the nn.service.keytab (Ambari did not deploy), thus I have executed kinit on namenode1001 itself.

user1@namenode1001:~$ sudo -u hdfs kinit -kt /etc/security/keytabs/nn.service.keytab nn/namenode1001.example.com@EXAMPLE.COM

user1@namenode1001:~$ sudo klist -eaf
Ticket cache: FILE:/tmp/krb5cc_0Default principal: nn/namenode1001.example.com@EXAMPLE.COM
Valid starting       ExpiresService       principal
07/19/2017 06:30:27  07/19/2017 16:30:27  krbtgt/EXAMPLE.COM@EXAMPLE.COM
renew until 07/20/2017 06:30:27, Flags: FPRIA
Etype (skey, tkt): aes256-cts-hmac-sha1-96, aes256-cts-hmac-sha1-96
Addresses: (none)

avatar

Yes, it's related to Kerberos ticket issue, you can find the same in your Ambari->Kerberos->Kerberos Conf Template

[libdefaults]

renew_lifetime = 7d

forwardable = true default_realm = {{realm}}

ticket_lifetime = 24hr

Possible solutions:

1) You need to work with AD team and change the password expire policy. Ensure that NN & other Ambari related principle should never expire

avatar

On KDC we did set for all NN and Ambari related principle never expires here is the snapshot of the same.

kadmin:getprinc nn/namenode1001.example.com@EXAMPLE.COM
Principal: nn/namenode1001.example.com@EXAMPLE.COM
Expiration date: [never]
Password expiration date: [none]
Maximum ticket life: 0 days 10:00:00
Maximum renewable life: 7 days 00:00:00

avatar
Master Mentor

@Mazin Mohammed

Next time remember to upload the - OS type and version

- Ambari and HDP version

- Cluster size

On the KDC server under [REALMS] /etc/krb5kdc/kdc.conf

set the max_life = 14h 0m 0s (2 weeks)

Having said that did you copy the /etc/krb5.conf to all the node?

Then on the clients check that the krb5.conf ,you should have renew_lifetime = 7d in the krb5.conf this will ,mean you kerberos ticket is valid 7 days and normally it should auto renew

Hope that helps

avatar

OS - Ubuntu 14.04 LTS trusty Ambari version - 2.4.20 HDP version - 2.2.4 Cluster size - 8 node (staging cluster)

Yes, I did copy krb5.conf to all the nodes. renew_lifetime = 7d (yes this is set).

Does this mean irrespective of what you set, ticket will get expired and we need to restart the service to renew the ticket ?

avatar
Master Mentor

@Mazin Mohammed

How about the ntp setting across the cluster?

Can you run the below commands substituting with the correct values. First run the klit -kt to get the principal for namenode ...

# klist -kt /etc/security/keytabs/nn.service.keytab
Keytab name: FILE:/etc/security/keytabs/nn.service.keytab
KVNO Timestamp           Principal
---- ------------------- ------------------------------------------------------
   1 07/18/2017 08:49:43 nn/my_fdqn.com@REALM.COM
   1 07/18/2017 08:49:43 nn/my_fdqn.com@REALM.COM
   1 07/18/2017 08:49:43 nn/my_fdqn.com@REALM.COM
   1 07/18/2017 08:49:43 nn/my_fdqn.com@REALM.COM
   1 07/18/2017 08:49:43 nn/my_fdqn.com@REALM.COM
# kinit -kt /etc/security/keytabs/nn.service.keytab nn/my_fdqn.com@REALM.COM
# klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: nn/my_fdqn.com@REALM.COM
Valid starting       Expires              Service principal
07/24/2017 06:30:56  07/25/2017 06:30:56  krbtgt/REALM.COM@REALM.COM

avatar
user1@namenode1001:~$ sudo klist -kt /etc/security/keytabs/nn.service.keytab 

Keytab name: FILE:/etc/security/keytabs/nn.service.keytab
KVNO Timestamp           Principal
---- ------------------- ------------------------------------------------------
  10 07/19/2017 07:54:02 nn/namenode1001.example.com@EXAMPLE.COM
  10 07/19/2017 07:54:02 nn/namenode1001.example.com@EXAMPLE.COM
  10 07/19/2017 07:54:02 nn/namenode1001.example.com@EXAMPLE.COM
  10 07/19/2017 07:54:02 nn/namenode1001.example.com@EXAMPLE.COM
  10 07/19/2017 07:54:02 nn/namenode1001.example.com@EXAMPLE.COM

user1@namenode1001:~$ kinit -kt /etc/security/keytabs/nn.service.keytab nn/namenode1001.example.com@EXAMPLE.COM
user1@namenode1001:~$ klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: nn/namenode1001.example.com@EXAMPLE.COM
Valid starting       ExpiresService       principal
07/24/2017 08:28:50  07/24/2017 18:28:50  krbtgt/EXAMPLE.COM@EXAMPLE.COM
renew until 07/31/2017 08:28:49

avatar
Master Mentor

@Mazin Mohammed

Your kerberos ticket should now expire in 7 days 07/31/2017 08:28:49 can you monitor to see if the nn again goes down?