Hello,
I have a kerberized 5.14 CDH cluster and I want to integrate with LDAP for user authentication (not service). I suppose this is what described in "Local MIT KDC with Active Directory Integration".
I have the following questions about this change, if I have understood the process right:
1. Users now will be defined in AD and not kerberos. This means that current kerberos keytab files will no longer be valid. Right?
2. In case of CLI pipelines, where we first have to do a kinit for the principal, how the authentication will be after LDAP?
3. Which services should be configured to work with LDAP?
- Cloudera Manager
- Hue
- Hive
- Impala
- ... ?
4. Will groups and Sentry permissions require to be re-configured after enabling LDAP?
Thank you,
gerasimos
Created 01-23-2019 08:52 AM
I am still a bit unclear about your current deployment and where your users (not kerberos principals) exist now. It sounds as if you may be using OS files (/etc/passwd and /etc/group) for local users.
This page is a good start to help shape how you want to approach integration with AD...
https://www.cloudera.com/documentation/enterprise/5-16-x/topics/sg_auth_overview.html
Specifically, the AD integration section:
https://www.cloudera.com/documentation/enterprise/5-16-x/topics/sg_auth_overview.html
Also, at the bottom of the page is a matrix of what components support different authentication types.
Direct answers to your questions:
(1)
I can keep both existing user principals along with the AD users (on different realms). Is this right?
When talking about these topics, it is important to know two things: where your users' keberos accounts exist and where the hadoop service principals exist. If they are in the same realm, nothing special needs to be done. If your AD users have kerberos credentials in your AD realm, but your hadoop realm is different (and MIT Kerberos based) then you would need to configure cross-realm for your hadoop KDC to trust TGTs from the AD realm.
Yes, it is also perfectly reaonable to have our users authenticate via your MIT KDC and have users also authenticate, via LDAP with Active Directory (acting as an LDAP server).
(2)
For users controlled by AD, will I still need to create them in OS level?
Probably "YES", but it depends on what you want to do.
How your users' user/group mapping for hadoop is configured is up to you. Hadoop services need to be able to assess authorization via user/group mapping which defaults to org.apache.hadoop.security.ShellBasedUnixGroupsMapping. This means that shell commands such as "id" will be used to determine a user's group membership and thus their access to certain data. How the OS resolves the "id -Gn" command or others is up to your OS's configuration.
As mentioned in the documentation cited above, there are are several tools you can use in your cluster's nodes to integrate OS user/group with LDAP. SSSD is a free, open source one, for instance.
To summarize how authn/authz works for most components:
- User authenticates via Kerberos (Or LDAP if supported)
- the derived user id is then mapped to groups for authorization.
Front-end applications like Hue and Cloudera Manager are a bit different as they utilize internal service principals to access resources on behalf of the user who has authenticated to the UI. Hue, for instance, will connect as the "hue" service principal for the host it is on to proxy access for the user who has authenticated to Hue.
(3)
"A one-way, cross-realm trust must be set up from the local Kerberos realm to the central AD realm..."
Since I'm still a bit unsure what your endgame is, I hesitate to commit to anything here.
If you want all users to use their AD accounts but keep your CDH cluster's service principals in the MIT KDC, then, yes, you would need a one way trust where your CDH cluster's realm's KDC trusts the AD KDC's realm.
If you want your users to still kinit against your MIT KDC and your service principals are also there, no change is necessary.
I hope that helps a bit... if you can clear up where your users' principals will be and where your service principals will be, we can probably help confirm what basic work that entails.
LDAP auth and where LDAP accounts exist does not depend on Kerberos. The good part of AD is that accounts accessed for Kerberos and for LDAP exist as the same object.
If you have users on your MIT KDC for kerberos auth, but you want to introduce LDAP auth for services that supported it with AD as the LDAP server, you can do that... the only thing to keep in mind is that the user name (usually sAMAccountName) needs to be the same as their previous user name (whatever that was before LDAP) in order for authorization to work the same as it had before.
For example: if currently there is a user, "userone", who has access to certain HDFS directories and Sentry provides access to certain resources, after moving to LDAP auth, the username derived from LDAP authenitication must be "userone" as well.
Created 01-22-2019 02:24 PM
Hi @gerasimos,
First, I think you may be confusing LDAP and Kerberos here so we need some clarification on what you are actually trying to achieve. For authentication. Kerberos and LDAP authentication are used for different things so it is important to understand how each is utilized.
For authentication in CDH, Kerberos is required for core hadoop (HDFS, YARN) and clients of those. Servers that are user-facing usually have an option of authenticating via LDAP or Kerberos. Internal is all Kerberos, external can be a mixture depending on what you would like and what the servers support.
Bottom line, for hadoop security, you need Kerberos, but some things allow for LDAP too.
Also, LDAP can be utilized for user-group mapping. I don't think that is what you are discussing.
Another thing to note is that "Kerberos" is not a server, it is a protocol. Any Kerberos v5 server is fine, but CDH requires MIT Kerberos client libraries for communication in non-Java Kerberos operations.
To your questions:
(1)
Users are created and stored where you want them. Active Directory hosts are also Kerberos servers in Windows systems so you can Use MIT Kerberos's KDC or Windows Active Directory as your Kerberos server. Active Directory also supports LDAP protocol requests. It is up to your deployment where your end users exist.
We'd need clarification about what configuration changes you are proposing to answer this better.
(2)
Need some clarification about what sort of "pipelines" you mean. What services are they connecting to?
(3)
No service "should" be configured with LDAP meaning there is no functional gain to authentication with LDAP vs Kerberos. Each will serve the purpose of authenticating.
The following can support LDAP authentication:
- Cloudera Manager
- Cloudera Navigator Metadata Server
- Hue
- Hive
- Impala
- Solr
- Sentry (in unsupported 'test' mode only - only Kerberos is supported)
(4)
No, no Sentry changes should be necessary as long as your groups and users are the same.
Permissions are group-based so that has nothing to do the means of authentication. Sentry provides authorization protection.
To summarize:
- authentication by end users can be achieved by the methods supported by the various servers (roles). Some support LDAP, Kerberos, SAML, etc.
- Once a user is authenticated, then authorization is evaluated depending on the role being accessed.
Hope that helps get started.
Created on 01-22-2019 11:27 PM - edited 01-23-2019 08:04 AM
Hello @bgooley,
Thank you for the detailed explanation. To clarify myself, when I said "kerberos" I meant the MIT KDC implementation, and yes I do not know much about LDAP and AD.
My organization has an Microsoft AD. It also has a CDH that uses MIT Kerberos for hadoop user and services authentication. CM and Hue have their own users.
The task is to review what needs to be done in order to have users declared in AD to use the cluster, e.g. for submiting Spark jobs, executing Impala queries, use CM, Hue etc.
As far as I have undestood, I can keep both existing user principals along with the AD users (on different realms). Is this right?
For users controlled by AD, will I still need to create them in OS level? If no, how HDFS user and group permissions are affected?
After your reply, I read again the link above, and I think that the key in this task is to undestand this:
"A one-way, cross-realm trust must be set up from the local Kerberos realm to the central AD realm containing the user principals that require access to the CDH cluster".
Thank you again for your effort.
Created 01-23-2019 08:52 AM
I am still a bit unclear about your current deployment and where your users (not kerberos principals) exist now. It sounds as if you may be using OS files (/etc/passwd and /etc/group) for local users.
This page is a good start to help shape how you want to approach integration with AD...
https://www.cloudera.com/documentation/enterprise/5-16-x/topics/sg_auth_overview.html
Specifically, the AD integration section:
https://www.cloudera.com/documentation/enterprise/5-16-x/topics/sg_auth_overview.html
Also, at the bottom of the page is a matrix of what components support different authentication types.
Direct answers to your questions:
(1)
I can keep both existing user principals along with the AD users (on different realms). Is this right?
When talking about these topics, it is important to know two things: where your users' keberos accounts exist and where the hadoop service principals exist. If they are in the same realm, nothing special needs to be done. If your AD users have kerberos credentials in your AD realm, but your hadoop realm is different (and MIT Kerberos based) then you would need to configure cross-realm for your hadoop KDC to trust TGTs from the AD realm.
Yes, it is also perfectly reaonable to have our users authenticate via your MIT KDC and have users also authenticate, via LDAP with Active Directory (acting as an LDAP server).
(2)
For users controlled by AD, will I still need to create them in OS level?
Probably "YES", but it depends on what you want to do.
How your users' user/group mapping for hadoop is configured is up to you. Hadoop services need to be able to assess authorization via user/group mapping which defaults to org.apache.hadoop.security.ShellBasedUnixGroupsMapping. This means that shell commands such as "id" will be used to determine a user's group membership and thus their access to certain data. How the OS resolves the "id -Gn" command or others is up to your OS's configuration.
As mentioned in the documentation cited above, there are are several tools you can use in your cluster's nodes to integrate OS user/group with LDAP. SSSD is a free, open source one, for instance.
To summarize how authn/authz works for most components:
- User authenticates via Kerberos (Or LDAP if supported)
- the derived user id is then mapped to groups for authorization.
Front-end applications like Hue and Cloudera Manager are a bit different as they utilize internal service principals to access resources on behalf of the user who has authenticated to the UI. Hue, for instance, will connect as the "hue" service principal for the host it is on to proxy access for the user who has authenticated to Hue.
(3)
"A one-way, cross-realm trust must be set up from the local Kerberos realm to the central AD realm..."
Since I'm still a bit unsure what your endgame is, I hesitate to commit to anything here.
If you want all users to use their AD accounts but keep your CDH cluster's service principals in the MIT KDC, then, yes, you would need a one way trust where your CDH cluster's realm's KDC trusts the AD KDC's realm.
If you want your users to still kinit against your MIT KDC and your service principals are also there, no change is necessary.
I hope that helps a bit... if you can clear up where your users' principals will be and where your service principals will be, we can probably help confirm what basic work that entails.
LDAP auth and where LDAP accounts exist does not depend on Kerberos. The good part of AD is that accounts accessed for Kerberos and for LDAP exist as the same object.
If you have users on your MIT KDC for kerberos auth, but you want to introduce LDAP auth for services that supported it with AD as the LDAP server, you can do that... the only thing to keep in mind is that the user name (usually sAMAccountName) needs to be the same as their previous user name (whatever that was before LDAP) in order for authorization to work the same as it had before.
For example: if currently there is a user, "userone", who has access to certain HDFS directories and Sentry provides access to certain resources, after moving to LDAP auth, the username derived from LDAP authenitication must be "userone" as well.
Created on 02-04-2019 12:35 AM - edited 02-04-2019 12:39 AM
Hello @bgooley
Thank you again for your guided reply. I spent some time with some hands-on so I have a better view now.
I started with Hue integration, which seemed the most straightforward (before go to hadoop level). I set-up an Active Directory 2008, and created some users under the "Users" container. In there, I also defined a "sentryadmins" group and made "user1" member of this group. I would expect this group (which by the way also exists in Hue and OS level) to be imported to Hue when user1 logs-in (shouldn't I?)
LDAP authentication works great when I login to Hue with "user1". I can also see that firstName, lastName and email fields have been imported. However, I have 2 issues with Hue authentication:
1. "sentryadmins" group is not imported as "user1" membership. I tried the "sync" functionality and nothing changes.
2. When I press "Sync LDAP users/groups" no users or groups are imported.
Can these be addressed?
Also, in case that something goes really bad with LDAP integration, how can I manually switch back to "AllowFirstUserDjangoBackend"? I am using CM for Hue configuration (and a bit of code in hue_safety_valve.ini)
Thank you,
Gerasimos
Created 02-04-2019 08:15 AM
I'm glad to hear you are making some progress with authentication implementation.
First, you can toggle authentication backend in CM via the Authentication Backend configuration:
You could also consider configuring multiple backends if that suits your needs:
http://gethue.com/configuring-hue-multiple-authentication-backends-and-ldap/
As for the group membership issue, you can add the following in order to ensure that a user's groups (and group membership) are synchronized when a user logs in:
[desktop]
[[ldap]]
sync_groups_on_login=true
The above can be added to your Hue Service Advanced Configuration Snippet (Safety Valve) for hue_safety_valve.ini
There could be any number of reasons that group membership is not being updated. One of the most likely issues is that LDAP Group Membership Attribute is not set to member as that attribute is what will be used to determine group membership when importing and synchronizing.
In order to gain debugging that can help you determine the cause of the issue, you can do the following:
(1)
Turn on Django debug:
Hue --> Configuration --> Advanced --> Enable Django Debug Mode
Check the box
(2)
In Hue Service Advanced Configuration Snippet (Safety Valve) for hue_safety_valve.ini add the following to enable LDAP debug
[desktop]
[[ldap]]
debug=true
debug_level=255
trace_level=9
(3)
Restart Hue
After that, reproduce the problem and then review logs:
- /var/log/hue/runcpserver.log
- stderr.log
- stdout.log
stderr and stdout exist in the process directory. You can get there by logging into the host where Hue runs and going to:
/var/run/cloudera-scm-agent/process/`ls -lrt /var/run/cloudera-scm-agent/process/ | awk '{print $9}' |grep HUE_SERVER|tail -1`/logs/
For example:
cd /var/run/cloudera-scm-agent/process/`ls -lrt /var/run/cloudera-scm-agent/process/ | awk '{print $9}' |grep HUE_SERVER|tail -1`/logs/
Created on 02-05-2019 01:28 AM - edited 02-05-2019 06:43 AM
Hello @bgooley
Thanks again. I enabled the logging, I saw how ldap queries are constructed and finally I got it working. Minor, but I think that I first need to do at least one "Add/Sync LDAP Group" of a specific group in order to be synced during login of new users.
So, what I have managed so far is:
1. Define (new) users in AD
2. Define the old hadoop groups in AD as well and configure users' memberships appropriately (I guess I have to do this to keep Sentry working as before)
3. When a user login in Hue, he get the group membership from AD
I am going further now with this, thank you again.
Gerasimos
Created 02-05-2019 06:40 AM
Assuming that I already had (OS & Hue) group "sentryadmins", then I am getting error on "Add/Sync LDAP group":
But, if I first delete the Hue group "sentryadmins", then the sync functionality works. Any idea for this?
It is supposed that sync will sync existing groups (my case) or add any new ones.
Created 02-05-2019 07:57 AM
Glad you are making some progress. As for the behavior with existing Hue groups, I'm not sure. We'd need to review the logs to compare what happens when you sync a group that has already been imported into Hue and one that hasn't been imported yet. Also, we would need to see the add_ldap_groups page to see what you selected.
Also, seeing your LDAP configuration for Hue would provide more context for the issue.
Created on 02-06-2019 05:05 AM - edited 02-06-2019 05:51 AM
Hello @bgooley
This is the error when syncing an existing group:
views WARNING There was a naming conflict while importing group sentryadmins in pattern sentryadmins
and more specifically, this line of useradmin/views.py
group, created = Group.objects.get_or_create(name=ldap_info['name'])
returns
group=sentryadmins created=False
So, I can tell that the group is not created at Django level when it already exists. Looking closer in the python code, there is a comment:
# This is a Hue group, and shouldn't be overwritten
which is right! The group already exists, should not be overwritten, but users should become members of the group during sync, which is not happening.