Created 01-30-2018 09:20 PM
Kerberos keeps gettings stuck in a loop. It gets stuck Here: (debugging is turned on)
[spark_remote@ip-192.168.1.100 ~]$ kinit -f -p -kt spark_remote.keytab spark_remote/this.server.fqdn@MYREALM.INTERNAL [spark_remote@ip-192.168.1.100 ~]$ spark-submit --master yarn kerberostest_2.11-1.0.jar /etc/krb5.conf spark_remote/this.server.fqdn@MYREALM.INTERNAL spark_remote.keytab 18/01/30 20:50:43 DEBUG UserGroupInformation: hadoop login 18/01/30 20:50:43 DEBUG UserGroupInformation: hadoop login commit 18/01/30 20:50:43 DEBUG UserGroupInformation: using kerberos user:spark_remote/this.server.fqdn@MYREALM.INTERNAL 18/01/30 20:50:43 DEBUG UserGroupInformation: Using user: "spark_remote/this.server.fqdn@MYREALM.INTERNAL" with name spark_remote/this.server.fqdn@MYREALM.INTERNAL 18/01/30 20:50:43 DEBUG UserGroupInformation: User entry: "spark_remote/this.server.fqdn@MYREALM.INTERNAL" 18/01/30 20:50:43 INFO UserGroupInformation: Login successful for user spark_remote/this.server.fqdn@MYREALM.INTERNAL using keytab file spark_remote.keytab 18/01/30 20:50:44 INFO SparkContext: Running Spark version 2.2.1 18/01/30 20:50:44 INFO SparkContext: Submitted application: TestKerberos 18/01/30 20:50:44 INFO SecurityManager: Changing view acls to: spark_remote 18/01/30 20:50:44 INFO SecurityManager: Changing modify acls to: spark_remote 18/01/30 20:50:44 INFO SecurityManager: Changing view acls groups to: 18/01/30 20:50:44 INFO SecurityManager: Changing modify acls groups to: 18/01/30 20:50:44 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark_remote); groups with view permissions: Set(); users with modify permissions: Set(spark_remote); groups with modify permissions: Set() 18/01/30 20:50:44 INFO Utils: Successfully started service 'sparkDriver' on port 45523. 18/01/30 20:50:44 INFO SparkEnv: Registering MapOutputTracker 18/01/30 20:50:44 INFO SparkEnv: Registering BlockManagerMaster 18/01/30 20:50:44 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 18/01/30 20:50:44 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 18/01/30 20:50:44 INFO DiskBlockManager: Created local directory at /mnt/tmp/blockmgr-6d9a3c56-e5bc-4f55-9a69-505f2bf6540d 18/01/30 20:50:44 INFO MemoryStore: MemoryStore started with capacity 414.4 MB 18/01/30 20:50:44 INFO SparkEnv: Registering OutputCommitCoordinator 18/01/30 20:50:45 INFO Utils: Successfully started service 'SparkUI' on port 4040. 18/01/30 20:50:45 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://this.server.fqdn:4040 18/01/30 20:50:45 INFO SparkContext: Added JAR file:/home/spark_remote/kerberostest_2.11-1.0.jar at spark://192.168.1.100:45523/jars/kerberostest_2.11-1.0.jar with timestamp 1517345445384 18/01/30 20:50:45 INFO Utils: Using initial executors = 0, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances 18/01/30 20:50:46 INFO Client: Attempting to login to the Kerberos using principal: spark_remote/this.server.fqdn@MYREALM.INTERNAL and keytab: spark_remote.keytab 18/01/30 20:50:46 INFO RMProxy: Connecting to ResourceManager at this.server.fqdn/192.168.1.100:8032 18/01/30 20:50:46 DEBUG UserGroupInformation: PrivilegedAction as:spark_remote/this.server.fqdn@MYREALM.INTERNAL (auth:KERBEROS) from:org.apache.hadoop.yarn.client.RMProxy.getProxy(RMProxy.java:136) 18/01/30 20:50:47 DEBUG UserGroupInformation: PrivilegedAction as:spark_remote/this.server.fqdn@MYREALM.INTERNAL (auth:KERBEROS) from:org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:725) 18/01/30 20:50:47 DEBUG UserGroupInformation: PrivilegedActionException as:spark_remote/this.server.fqdn@MYREALM.INTERNAL (auth:KERBEROS) cause:org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): Failure to initialize security context 18/01/30 20:50:47 DEBUG UserGroupInformation: PrivilegedAction as:spark_remote/this.server.fqdn@MYREALM.INTERNAL (auth:KERBEROS) from:org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:650) 18/01/30 20:50:47 DEBUG UserGroupInformation: Found tgt Ticket (hex) = 0000: 61 82 01 93 30 82 01 8F A0 03 02 01 05 A1 17 1B a...0........... 0010: 15 44 41 54 41 50 41 53 53 50 4F 52 54 2E 49 4E .MYREALM.IN 0020: 54 45 52 4E 41 4C A2 2A 30 28 A0 03 02 01 02 A1 TERNAL.*0(...... 0030: 21 30 1F 1B 06 6B 72 62 74 67 74 1B 15 44 41 54 !0...krbtgt.aMYR 0040: 41 50 41 53 53 50 4F 52 54 2E 49 4E 54 45 52 4E MYREALM>>.INTERNL 0050: 41 4C A3 82 01 41 30 82 01 3D A0 03 02 01 12 A1 AL...A0..=...... 0060: 03 02 01 02 A2 82 01 2F 04 82 01 2B A2 05 25 CA ......./...+..%. 0070: A1 82 EA 93 2B AF 43 86 9E A7 94 20 CA D9 B8 C0 ....+.C.... .... 0080: E0 1E 22 D5 4E 73 69 DB 8A 3A 39 08 71 8F 32 C2 ..".Nsi..:9.q.2. 0090: 68 18 DD F4 A0 B2 21 F7 A5 9A 6B 5B 1A E5 FA 1E h.....!...k[.... 00A0: C5 F6 13 E7 17 36 2F 74 EA 0C 12 76 82 63 09 62 .....6/t...v.c.b 00B0: 15 95 61 BF 1E 35 79 B5 82 CF 90 9A 57 B8 6F F7 ..a..5y.....W.o. 00C0: 7B EE 20 7E 87 F3 A9 10 ED 93 79 F2 D2 AE 6B 39 .. .......y...k9 00D0: D9 CD 9D 9D 51 2E BC 98 C0 4D 8F 2F C5 7F B3 2E ....Q....M./.... 00E0: 36 B6 A3 D9 E4 D5 B7 B6 FA AF 56 4A F0 9B 2D B1 6.........VJ..-. 00F0: 24 70 2A DF E9 88 0C F6 1C 9D 9A 66 42 77 42 95 $p*........fBwB. 0100: B2 0B B3 7C DE 95 93 56 E7 CB A0 67 FB 5E 45 4E .......V...g.^EN 0110: 18 D8 75 91 94 10 23 42 9F BA 15 D3 23 B1 85 4D ..u...#B....#..M 0120: 10 AF 1F 48 12 96 D9 06 EA 2C 34 5C DA F7 4C 1A ...H.....,4\..L. 0130: DC 86 B4 23 57 45 34 BE 90 FE B8 33 84 15 94 70 ...#WE4....3...p 0140: 72 04 8E E7 F0 DD 90 DA 41 F6 30 73 CF 80 79 F8 r.......A.0s..y. 0150: E7 E4 D9 4C C3 AD 6A B3 F3 AD 85 01 B0 4E 65 EF ...L..j......Ne. 0160: 4D EF 75 1B FA 0C D6 7C 01 CE 97 23 D5 FD 70 C0 M.u........#..p. 0170: 1F 8C B3 C6 1A 54 DD 13 3D 07 46 EC 83 D4 00 C4 .....T..=.F..... 0180: 57 EF 56 30 F7 AF 1B 08 98 C7 D9 85 12 32 00 8D W.V0.........2.. 0190: 21 B1 09 75 41 59 57 !..uAYW Client Principal = spark_remote/this.server.fqdn@MYREALM.INTERNAL Server Principal = krbtgt/MYREALM.INTERNAL@MYREALM.INTERNAL Session Key = EncryptionKey: keyType=18 keyBytes (hex dump)= 0000: A8 C3 93 72 3A 9B C2 4E 4E 99 CA 84 70 F3 EB 36 ...r:..NN...p..6 0010: B5 15 7B BE 22 7F EB 30 E6 DD F4 22 D6 D1 82 38 ...."..0..."...8 Forwardable Ticket true Forwarded Ticket false Proxiable Ticket false Proxy Ticket false Postdated Ticket false Renewable Ticket false Initial Ticket false Auth Time = Tue Jan 30 20:50:43 UTC 2018 Start Time = Tue Jan 30 20:50:43 UTC 2018 End Time = Tue Jan 30 21:05:43 UTC 2018 Renew Till = null Client Addresses Null 18/01/30 20:50:48 DEBUG UserGroupInformation: PrivilegedAction as:spark_remote/this.server.fqdn@MYREALM.INTERNAL (auth:KERBEROS) from:org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:725) 18/01/30 20:50:48 DEBUG UserGroupInformation: PrivilegedActionException as:spark_remote/this.server.fqdn@MYREALM.INTERNAL (auth:KERBEROS) cause:org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): Failure to initialize security context 18/01/30 20:50:48 DEBUG UserGroupInformation: PrivilegedAction as:spark_remote/this.server.fqdn@MYREALM.INTERNAL (auth:KERBEROS) from:org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:650) 18/01/30 20:50:48 DEBUG UserGroupInformation: Found tgt Ticket (hex) = 0000: 61 82 01 93 30 82 01 8F A0 03 02 01 05 A1 17 1B a...0........... ...(and rince and repeat)
my k5b5.conf:
# Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. [libdefaults] default_realm = MYREALM.INTERNAL dns_lookup_realm = false dns_lookup_kdc = false rdns = true ticket_lifetime = 24h forwardable = true udp_preference_limit = 1000000 default_tgs_enctypes = aes256-cts aes128-cts default_tkt_enctypes = aes256-cts aes128-cts permitted_enctypes = aes256-cts aes128-cts udp_preference_limit = 1 [realms] MYREALM.INTERNAL = { kdc = kdc.server.internal:88 admin_server = kdc.server.internal:749 default_domain = mydomain.internal } [domain_realm] thismachine.fqdn = MYREALM.INTERNAL .us-west-2.compute.internal = MYREALM.INTERNAL us-west-2.compute.internal = MYREALM.INTERNAL .otherdomain.internal = MYREALM.INTERNAL otherdomain.internal = MYREALM.INTERNAL .mydomain.internal = MYREALM.INTERNAL mydomain.internal = MYREALM.INTERNAL [logging] kdc = FILE:/var/log/kerberos/krb5kdc.log admin_server = FILE:/var/log/kerberos/kadmin.log default = FILE:/var/log/kerberos/krb5lib.log
You'll notice the error message is not very clear about what it's unhappy about it just says:
Failure to initialize security context
From kdc server /var/log/krb5kdc.log
Jan 30 15:42:32 kdc.server.internal krb5kdc[9279](info): AS_REQ (2 etypes {18 17}) 192.168.1.100: NEEDED_PREAUTH: spark_remote/this.server.fqdn@MYREALM.INTERNAL for krbtgt/MYREALM.INTERNAL@MYREALM.INTERNAL, Additional pre-authentication required Jan 30 15:42:32 kdc.server.internal krb5kdc[9279](info): AS_REQ (2 etypes {18 17}) 192.168.1.100: ISSUE: authtime 1517344952, etypes {rep=18 tkt=18 ses=18}, spark_remote/this.server.fqdn@MYREALM.INTERNAL for krbtgt/MYREALM.INTERNAL@MYREALM.INTERNAL Jan 30 15:42:37 kdc.server.internal krb5kdc[9279](info): AS_REQ (2 etypes {18 17}) 192.168.1.100: NEEDED_PREAUTH: spark_remote/this.server.fqdn@MYREALM.INTERNAL for krbtgt/MYREALM.INTERNAL@MYREALM.INTERNAL, Additional pre-authentication required Jan 30 15:42:37 kdc.server.internal krb5kdc[9279](info): AS_REQ (2 etypes {18 17}) 192.168.1.100: ISSUE: authtime 1517344957, etypes {rep=18 tkt=18 ses=18}, spark_remote/this.server.fqdn@MYREALM.INTERNAL for krbtgt/MYREALM.INTERNAL@MYREALM.INTERNAL Jan 30 15:42:41 kdc.server.internal krb5kdc[9279](info): TGS_REQ (2 etypes {18 17}) 192.168.1.100: ISSUE: authtime 1517344957, etypes {rep=18 tkt=18 ses=18}, spark_remote/this.server.fqdn@MYREALM.INTERNAL for yarn/this.server.fqdn@MYREALM.INTERNAL Jan 30 15:50:38 kdc.server.internal krb5kdc[9279](info): AS_REQ (2 etypes {18 17}) 192.168.1.100: NEEDED_PREAUTH: spark_remote/this.server.fqdn@MYREALM.INTERNAL for krbtgt/MYREALM.INTERNAL@MYREALM.INTERNAL, Additional pre-authentication required Jan 30 15:50:38 kdc.server.internal krb5kdc[9279](info): AS_REQ (2 etypes {18 17}) 192.168.1.100: ISSUE: authtime 1517345438, etypes {rep=18 tkt=18 ses=18}, spark_remote/this.server.fqdn@MYREALM.INTERNAL for krbtgt/MYREALM.INTERNAL@MYREALM.INTERNAL Jan 30 15:50:43 kdc.server.internal krb5kdc[9279](info): AS_REQ (2 etypes {18 17}) 192.168.1.100: NEEDED_PREAUTH: spark_remote/this.server.fqdn@MYREALM.INTERNAL for krbtgt/MYREALM.INTERNAL@MYREALM.INTERNAL, Additional pre-authentication required Jan 30 15:50:43 kdc.server.internal krb5kdc[9279](info): AS_REQ (2 etypes {18 17}) 192.168.1.100: ISSUE: authtime 1517345443, etypes {rep=18 tkt=18 ses=18}, spark_remote/this.server.fqdn@MYREALM.INTERNAL for krbtgt/MYREALM.INTERNAL@MYREALM.INTERNAL Jan 30 15:50:47 kdc.server.internal krb5kdc[9279](info): TGS_REQ (2 etypes {18 17}) 192.168.1.100: ISSUE: authtime 1517345443, etypes {rep=18 tkt=18 ses=18}, spark_remote/this.server.fqdn@MYREALM.INTERNAL for yarn/this.server.fqdn@MYREALM.INTERNAL Jan 30 15:52:18 kdc.server.internal krb5kdc[9279](info): AS_REQ (2 etypes {18 17}) 192.168.1.100: NEEDED_PREAUTH: spark_remote/this.server.fqdn@MYREALM.INTERNAL for krbtgt/MYREALM.INTERNAL@MYREALM.INTERNAL, Additional pre-authentication required Jan 30 15:52:18 kdc.server.internal krb5kdc[9279](info): AS_REQ (2 etypes {18 17}) 192.168.1.100: ISSUE: authtime 1517345538, etypes {rep=18 tkt=18 ses=18}, spark_remote/this.server.fqdn@MYREALM.INTERNAL for krbtgt/MYREALM.INTERNAL@MYREALM.INTERNAL Jan 30 15:52:21 kdc.server.internal krb5kdc[9279](info): TGS_REQ (2 etypes {18 17}) 192.168.1.100: ISSUE: authtime 1517345538, etypes {rep=18 tkt=18 ses=18}, spark_remote/this.server.fqdn@MYREALM.INTERNAL for yarn/this.server.fqdn@MYREALM.INTERNAL
Any suggestions of steps to try would be appreciated.
Created 01-31-2018 04:55 PM
rebooted all services that had keytabs, and then I was able to connect. There error stopped. Thanks for the responses.
Created 01-30-2018 10:43 PM
Ensure the entries in your /etc/hosts are not pointing just to host names, they should be FQDN.
Created 01-30-2018 11:27 PM
Thanks, I saw your other post @Geoffrey Shelton Okot and I did make sure the host file was empty.
Created 01-31-2018 07:43 AM
@Matt Andruff your Kerberos log says that your application (spark_remote/this.server.fqdn@MYREALM.INTERNAL) was granted a service ticket for yarn/this.server.fqdn@MYREALM.INTERNAL. Looks like this ticket is not accepted by the resource manager. If your resource manager is otherwise working well with Kerberos, I really think @Geoffrey Shelton Okot is right that it is something with the names.
can you check your name resolution with the below commands and verify that they all provide the FQDN name (this.server.fqdn) and the same IP (192.168.1.100)?
nslookup this.server.fqdn nslookup this
nslookup 192.168.1.100
Created 01-31-2018 02:10 PM
Here's the output:
[root@ec2-user]# nslookup this Server: 192.168.1.100 Address: 192.168.1.100#53 Non-authoritative answer: Name: this.server.fqdn.compute.internal Address: 172.31.10.196 [root@ec2-user]# nslookup this.server.fqdn Server: 192.168.1.100 Address: 192.168.1.100#53 ** server can't find this.server.fqdn: NXDOMAIN [root@ip-172-31-10-196 ec2-user]# nslookup this Server: 192.168.1.100 Address: 192.168.1.100#53 Non-authoritative answer: Name: this.server.fqdn.compute.internal Address: 172.31.10.196 [root@ec2-user]# nslookup 192.168.1.100 Server: 192.168.1.100 Address: 192.168.1.100#53 Non-authoritative answer: 100.1.168.192.in-addr.arpa name = ip-192.168.1.100.us-west-2.compute.internal. Authoritative answers can be found from:
Obviously this output is obstificated... I"m happy to share the real output privately if that helps. I'm running in amazon on a EC2 cluster.
Created 01-31-2018 02:45 PM
May be you can verify the following point, in your intial log I can see this entry:
18/01/30 20:50:46 INFO RMProxy: Connecting to ResourceManager at this.server.fqdn/192.168.1.100:8032
So to me it looks like it tries to connect using the IP address. If that is true, and the reverse lookup by the IP doesn't return the name this.server.fqdn, the ticket that was granted for yarn/this.server.fqdn@MYREALM.INTERNAL can't be accepted.
And what I can see from your output this is the case (if this is a result of the obfuscation just correct me):
What should help in that case:
Created 01-31-2018 04:55 PM
rebooted all services that had keytabs, and then I was able to connect. There error stopped. Thanks for the responses.