I am attempting to enable Kerberos authentication for users, against an Active Directory based realm. I am following the model of having a MIT KDC to house Cloudera principals, and then establishing a Cross Realm Trust to the AD realm, to allow AD users to authenticate.
At a purely Kerberos level, this is working fine (see the example below), however when I attempt a cluster operation that requires Kerberos authentication, I see consistent and fairly general failures. I believe I have followed the AD integration docs, with the exception of the final part around configuring name translation. I don't think I have reached the stage in authentication where this is playing a role.
SITE.PRODUCT.COMPANY.ORG: Cluster KRB Realm
SITE.COMPANY.ORG: Active Directory KRB Realm
product.company.org: DNS domain of all cluster nodes
auser: the test user account
If it has any relevance, the krb5.conf configuration for the AD Realm (and wider OS level AD integration) is managed by the Linux `realm` command, which utilises the underliying `ad-cli` command and has therefore been auto-populated.
This demonstrates that the cross realm trust is working at a Kerberos level:
==============================================================
[deployer@test-edge-01 ~]$ kinit auser@SITE.COMPANY.ORG
Password for auser@SITE.COMPANY.ORG:
[deployer@test-edge-01 ~]$ kvno hdfs/test-data-01.product.company.org@SITE.PRODUCT.COMPANY.ORG
hdfs/test-data-01.product.company.org@SITE.PRODUCT.COMPANY.ORG: kvno = 2
[deployer@test-edge-01 ~]$ klist -e
Ticket cache: FILE:/tmp/krb5cc_997
Default principal: auser@SITE.COMPANY.ORG
Valid starting Expires Service principal
13/03/19 09:10:57 13/03/19 19:10:57 krbtgt/SITE.COMPANY.ORG@SITE.COMPANY.ORG
renew until 20/03/19 09:10:43, Etype (skey, tkt): aes256-cts-hmac-sha1-96, aes256-cts-hmac-sha1-96
13/03/19 09:11:22 13/03/19 19:10:57 krbtgt/SITE.PRODUCT.COMPANY.ORG@SITE.COMPANY.ORG
renew until 20/03/19 09:10:43, Etype (skey, tkt): aes256-cts-hmac-sha1-96, aes256-cts-hmac-sha1-96
13/03/19 09:11:22 13/03/19 19:10:57 hdfs/test-data-01.product.company.org@SITE.PRODUCT.COMPANY.ORG
renew until 18/03/19 09:11:22, Etype (skey, tkt): aes256-cts-hmac-sha1-96, aes256-cts-hmac-sha1-96
==============================================================
However, this is what happens when I try to run a cluster operation that requires authentication (based on the ticket granted in the `kinit` above):
==============================================================
[deployer@test-edge-01 ~]$ export HADOOP_OPTS="-Dsun.security.krb5.debug=true"
[deployer@test-edge-01 ~]$ hdfs dfs -ls hdfs://test-master-02.product.company.org:8020/
Java config name: null
Native config name: /etc/krb5.conf
Loaded from native config
>>>KinitOptions cache name is /tmp/krb5cc_997
>>>DEBUG <CCacheInputStream> client principal is auser@SITE.COMPANY.ORG
>>>DEBUG <CCacheInputStream> server principal is krbtgt/SITE.COMPANY.ORG@SITE.COMPANY.ORG
>>>DEBUG <CCacheInputStream> key type: 18
>>>DEBUG <CCacheInputStream> auth time: Wed Mar 13 09:10:57 UTC 2019
>>>DEBUG <CCacheInputStream> start time: Wed Mar 13 09:10:57 UTC 2019
>>>DEBUG <CCacheInputStream> end time: Wed Mar 13 19:10:57 UTC 2019
>>>DEBUG <CCacheInputStream> renew_till time: Wed Mar 20 09:10:43 UTC 2019
>>> CCacheInputStream: readFlags() FORWARDABLE; RENEWABLE; INITIAL; PRE_AUTH;
>>>DEBUG <CCacheInputStream> client principal is auser@SITE.COMPANY.ORG
>>>DEBUG <CCacheInputStream> server principal is X-CACHECONF:/krb5_ccache_conf_data/pa_type/krbtgt/SITE.COMPANY.ORG@SITE.COMPANY.ORG
>>>DEBUG <CCacheInputStream> key type: 0
>>>DEBUG <CCacheInputStream> auth time: Thu Jan 01 00:00:00 UTC 1970
>>>DEBUG <CCacheInputStream> start time: null
>>>DEBUG <CCacheInputStream> end time: Thu Jan 01 00:00:00 UTC 1970
>>>DEBUG <CCacheInputStream> renew_till time: null
>>> CCacheInputStream: readFlags()
>>>DEBUG <CCacheInputStream> client principal is auser@SITE.COMPANY.ORG
>>>DEBUG <CCacheInputStream> server principal is krbtgt/SITE.PRODUCT.COMPANY.ORG@SITE.COMPANY.ORG
>>>DEBUG <CCacheInputStream> key type: 18
>>>DEBUG <CCacheInputStream> auth time: Wed Mar 13 09:10:57 UTC 2019
>>>DEBUG <CCacheInputStream> start time: Wed Mar 13 09:11:22 UTC 2019
>>>DEBUG <CCacheInputStream> end time: Wed Mar 13 19:10:57 UTC 2019
>>>DEBUG <CCacheInputStream> renew_till time: Wed Mar 20 09:10:43 UTC 2019
>>> CCacheInputStream: readFlags() FORWARDABLE; RENEWABLE; PRE_AUTH;
>>>DEBUG <CCacheInputStream> client principal is auser@SITE.COMPANY.ORG
>>>DEBUG <CCacheInputStream> server principal is hdfs/test-data-01.product.company.org@SITE.PRODUCT.COMPANY.ORG
>>>DEBUG <CCacheInputStream> key type: 18
>>>DEBUG <CCacheInputStream> auth time: Wed Mar 13 09:10:57 UTC 2019
>>>DEBUG <CCacheInputStream> start time: Wed Mar 13 09:11:22 UTC 2019
>>>DEBUG <CCacheInputStream> end time: Wed Mar 13 19:10:57 UTC 2019
>>>DEBUG <CCacheInputStream> renew_till time: Mon Mar 18 09:11:22 UTC 2019
>>> CCacheInputStream: readFlags() FORWARDABLE; RENEWABLE; PRE_AUTH;
Found ticket for auser@SITE.COMPANY.ORG to go to krbtgt/SITE.COMPANY.ORG@SITE.COMPANY.ORG expiring on Wed Mar 13 19:10:57 UTC 2019
Entered Krb5Context.initSecContext with state=STATE_NEW
Found ticket for auser@SITE.COMPANY.ORG to go to krbtgt/SITE.COMPANY.ORG@SITE.COMPANY.ORG expiring on Wed Mar 13 19:10:57 UTC 2019
Service ticket not found in the subject
>>> Realm doInitialParse: cRealm=[SITE.COMPANY.ORG], sRealm=[SITE.PRODUCT.COMPANY.ORG]
>>> Realm parseCapaths: no cfg entry
>>> Credentials acquireServiceCreds: main loop: [0] tempService=krbtgt/SITE.PRODUCT.COMPANY.ORG@SITE.COMPANY.ORG
Using builtin default etypes for default_tgs_enctypes
default etypes for default_tgs_enctypes: 18 17 16 23 1 3.
>>> CksumType: sun.security.krb5.internal.crypto.RsaMd5CksumType
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
>>> KdcAccessibility: reset
>>> Credentials acquireServiceCreds: no tgt; searching backwards
>>> Credentials acquireServiceCreds: inner loop: [1] tempService=krbtgt/COMPANY.ORG@SITE.COMPANY.ORG
Using builtin default etypes for default_tgs_enctypes
default etypes for default_tgs_enctypes: 18 17 16 23 1 3.
>>> CksumType: sun.security.krb5.internal.crypto.RsaMd5CksumType
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
>>> Credentials acquireServiceCreds: inner loop: [2] tempService=krbtgt/PRODUCT.COMPANY.ORG@SITE.COMPANY.ORG
Using builtin default etypes for default_tgs_enctypes
default etypes for default_tgs_enctypes: 18 17 16 23 1 3.
>>> CksumType: sun.security.krb5.internal.crypto.RsaMd5CksumType
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
>>> Credentials acquireServiceCreds: no tgt; cannot get creds
KrbException: Fail to create credential. (63) - No service creds
at sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:299)
at sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:454)
at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:641)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:248)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:193)
at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:413)
at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:560)
at org.apache.hadoop.ipc.Client$Connection.access$1900(Client.java:375)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:730)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:726)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:725)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1524)
at org.apache.hadoop.ipc.Client.call(Client.java:1447)
at org.apache.hadoop.ipc.Client.call(Client.java:1408)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:762)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy15.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2121)
at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1215)
at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1211)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1211)
at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:64)
at org.apache.hadoop.fs.Globber.doGlob(Globber.java:285)
at org.apache.hadoop.fs.Globber.glob(Globber.java:151)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1639)
at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:326)
at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:235)
at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:218)
at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:102)
at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:315)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:372)
19/03/13 09:13:19 WARN security.UserGroupInformation: PriviledgedActionException as:auser@SITE.COMPANY.ORG (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Fail to create credential. (63) - No service creds)]
==============================================================
The output is actually truncated, but as far as I can see, is a repetition of the above.
Things that I have considered:
I am running out of ideas. Can anybody suggest what else I should be checking/may be missing?
Many thanks.
Created on 03-13-2019 08:37 AM - edited 03-13-2019 08:38 AM
Looks like I have got to the bottom of this and it is rooted in the way that realm/ad-cli/sssd manages configuration of servers, that are members of an AD domain.
We join machines to the domain using `realm join ...`. This convenient command takes care of creating a machine account in the domain and then managing all the config files that need amending on the host being joined (sssd.conf, krb5.conf, PAM, etc).
Specifically, for the krb5.conf it adds an empty block for the realm:
[realms] SITE.COMPANY.ORG = { }
and an include statement:
includedir /var/lib/sss/pubconf/krb5.include.d/
within this included directory are a series of config fragments that take of the actual configuration. A nice, manageable way of handling complex configuration, such as when you have multiple realms.
Java 1.7 does not support include directives in krb5.conf.
Therefore native Kerberos operations work fine with this config, however Cloudera is unable to.
Workaround is to explicitly add 'kdc' and 'admin_server' directives within the empty realm block.
Created on 03-13-2019 08:37 AM - edited 03-13-2019 08:38 AM
Looks like I have got to the bottom of this and it is rooted in the way that realm/ad-cli/sssd manages configuration of servers, that are members of an AD domain.
We join machines to the domain using `realm join ...`. This convenient command takes care of creating a machine account in the domain and then managing all the config files that need amending on the host being joined (sssd.conf, krb5.conf, PAM, etc).
Specifically, for the krb5.conf it adds an empty block for the realm:
[realms] SITE.COMPANY.ORG = { }
and an include statement:
includedir /var/lib/sss/pubconf/krb5.include.d/
within this included directory are a series of config fragments that take of the actual configuration. A nice, manageable way of handling complex configuration, such as when you have multiple realms.
Java 1.7 does not support include directives in krb5.conf.
Therefore native Kerberos operations work fine with this config, however Cloudera is unable to.
Workaround is to explicitly add 'kdc' and 'admin_server' directives within the empty realm block.
Created on 10-10-2021 01:43 PM - edited 10-10-2021 01:43 PM
if you use MIT kerberos server or Freeipa so 'kdc' it is bad workaround because you should make HA for kerberos using DNS as balancing for KDC servers.
You need to switch on dns_lookup_kdc=true and it will discover any external realms so if that realm have a trust (for example two-ways) you can use direct connection to any external KDC to get TGT and then to ask TGS from your realm service or to get TGT in your realm and to connect external server with TGS for that service.
Java (not Hadoop) doesn't support included configs but when you use execute authentication class the processing goes over sssd that use config to get KDC info.
However if your Active Directory domain is second domain level but MIT has the third domain level you will look conflict for routing becase all of you internal realm request will go to AD.
It can be solved by adding routing in krb5.conf to [domain_realm] section like:
[domain_realm]
mit.domain.local = MIT.DOMAIN.LOCAL
.mit.domain.local = MIT.DOMAIN.LOCAL
host.mit.domain.local = MIT.DOMAIN.LOCAL
domain.local = AD.DOMAIN.LOCAL
.domain.local = AD.DOMAIN.LOCAL