Created on 09-16-2022 01:23 PM - edited 09-19-2022 10:06 AM
The identity team at Cloudera has been working to add the System for Cross-domain Identity Management (SCIM) support to Cloudera Data Platform (CDP) and we're happy to announce the general availability of SCIM on Azure Active Directory! If you know what SCIM is and you want to start using it then here's a link to our docs to get started. Otherwise, read on. It might just save you a bunch of time and hassle.
This blog is broken into two parts:
SCIM support on CDP helps ease and, in many cases, resolve the pain points of our customers. Some of these pain points are listed below.
The main win is that we've helped streamline and automate our customers’ approach to handling their users' lifecycles, bringing the most benefit around "joiners, movers, and leavers." We have customers who, due to SAML limitations (some of which I go into detail below), have had to write code that captures changes in their identity provider and uses the CDP command line interface, software development kit, or application programing interface to sync those changes to CDP.
Now our customers—without writing any custom code—can precreate and set up CDP user accounts to speed onboarding (joiners); they can automatically push user and group membership changes, modify user permissions across CDP (movers); and they can automatically delete users and groups, revoking privileges in CDP (leavers).
Users must be assigned permissions before they can make use of CDP features, so it’s desirable to assign permissions before users log in to CDP. Importing the needed users and groups into CDP previously required explicit export and import of users and groups from the customer identity provider (IdP). This was accomplished by manually exporting them from the customer IdP and then importing them into Cloudera Data Platform via the CDP user interface or API scripting. Otherwise, administrators must wait for all users to log in so that their CDP user, groups, and group memberships could be created through SAML login. This resulted in unnecessary and possibly error-prone administrative overhead that could take unnecessarily long to onboard your user base.
Now, with SCIM integration, users and groups can be automatically imported at once before users need to log in. Then permissions can be assigned by appropriate administrators.
The "joiner" scenario is more common in CDP than other services, as our platform covers a wide breadth of customer personas. With SAML, you need to log in to create your account in a service. This means that everyone who wants to use CDP needs to log in to CDP. Some personas, like data scientists, don't even need to know that CDP is what enables them to run their models.
SCIM eliminates the control plane requirement completely (or, at least, mostly) for such users. Instead, their CDP accounts are created with SCIM, and then their CDP user accounts are synced to the cluster where they work. They can log in to the cluster by going directly to the URL of the service they need (i.e., the URL of Hive/Hue/etc.), bypassing the CDP control plane. (Under the hood they are seamlessly authenticated against CDP and their identity provider, but that is a topic for a different blog post.)
We also work with companies in heavily regulated industries for which the "movers" scenario with pure SAML was insufficient. A common example here is someone who needs temporarily elevated privileges to perform an action (for example, a one-time setup). They ask, and are granted, the elevated privileges through a group change in the company's identity provider. When they perform an SAML log in those new groups are assigned to their user. After they've finished, the identity provider admin removes them from the groups with elevated privileges. The problem is that while the group change has occurred in the identity provider, the user can keep their elevated privileges until their CDP session times out and they are forced to log in again. Propagating group changes with SAML is additionally complicated because a CDP user can create long-lived access keys to perform actions without needing to log in to CDP. As long as that user doesn't log in to CDP, SAML will never update their groups. This breaks regulations and our customers have historically worked with professional services to help write code that waited for their identity provider to send a webhook on group change, and then called CDP to make the group changes in real time.
With SCIM that custom code can go away. And our customers that were doing this manually now have an easy way to automate it, with no custom code.
Perhaps most important for heavily regulated companies is the "leavers" scenario. This is the scenario that needs to happen promptly, especially for terminated employees in regulated industries. The status quo before SCIM was to have a manual runbook, or to have automated scripts.
Now, our customers can just enable SCIM and their identity provider will push those changes to CDP automatically, and CDP will take care of the rest.
SAML–based onboarding with certain identity providers has limitations that require custom code. Some IdPs cap the number of groups they will put in an SAML payload. A common example is Azure AD's 150 groups-per-claim limit, where a user's SAML payload will contain no groups once 150 groups are exceeded (source).
SCIM doesn't have this limitation, so if you use Azure AD and assign more than 150 groups to your users then SCIM will fix this problem. (Once you do this you'll also need to disable syncing group memberships on log in).
An Azure AD SAML payload will only send groups' "sAMAccountName" (the human readable group name) for groups created in Active Directory Federation Services (AD FS) and then synced to Azure AD with AD Connect. For groups that are created natively in Azure AD, Azure AD will only include the group's object ID (a UUID) in SAML payloads.
SCIM doesn't have this limitation, so if you use Azure AD natively and are having difficulty getting your groups to show up in a human readable format with SAML, then SCIM will fix this problem. (Once you do this you'll also need to disable syncing group memberships on log in.)
SCIM support in Cloudera Data Platform for Azure AD simplifies identity management burden and eliminates the need for a lot of custom scripting, coding, and manual runbook processes.
This blog post has helped illustrate the power of SCIM and how SCIM support in CDP helps to ease the pain points encountered by customers related to managing users in CDP.
If you or your company are concerned about having to script, have manual runbooks for common user management related changes, or are hitting SAML limitations, then perhaps SCIM can help reduce your time spent managing identities in CDP. Head to our docs to get started.
If your organization uses Okta and you'd like to start using SCIM with CDP then contact your Cloudera rep to get added to the waitlist—Okta support is coming soon.
Part Twp of this blog series where we provide an introduction to SCIM, for readers who are not familiar with the SCIM standard.
Part Two is now up: SCIM, an introduction.