Created on 07-15-2021 07:09 AM - edited on 09-06-2023 03:12 AM by VidyaSargur
It may often come to pass that you need to utilize data that does not exist in your Environment's Data Lake. This article will cover 2 scenarios where you need to access data that is outside of your Data Lake (either in a Storage Account in the same resource group as your CDP Public Cloud or in a different resource group, albeit in the same subscription).
UPDATE: Scenarios 3 and 4 below cover RAZ!
In this scenario, you may want a single managed identity to be able to access both your Data Lake Storage Account as well as your External Storage Account. For this, I will choose to grant access to only a single container within the storage account (but, as you will see in the following scenario, you could grant access to the entire storage account - I'm showing both purely for illustration).
az role assignment create --assignee $DATAACCESS_OBJECTID --role 'ba92f5b4-2d11-453d-a403-e96b0029c9fe' --scope "/subscriptions/$SUBSCRIPTIONID/resourceGroups/$RESOURCEGROUPNAME/providers/Microsoft.Storage/storageAccounts/$STORAGEACCOUNTNAME/blobServices/default/containers/$CONTAINERNAME"
Note: 'ba92f5b4-2d11-453d-a403-e96b0029c9fe' is the GUID that maps to the built in Azure role "Storage Blob Data Contributor", which allows for Edit Access in the container.
Now, when you view the role assignments for this managed identity, you should see a new entry for the external storage account.Note: It may take several minutes until this page reflects your RBAC change.
Anyone who has an IDBroker mapping to this MSI can now access this new container in the external storage account.
In this scenario, we will create a brand new managed identity and provision access to the entire storage account.
You are now ready to map this MSI to users in CDP (via an IDBroker Mapping). Since this is a new role, let's quickly review how to do that...
If you followed Scenario 1 and have the DataAccessIdentity mapped to your user, you should now be able to access both the data container in the Data Lake Storage Account and our new container in the External Storage Account.
If you followed Scenario 2 and have the new MSI mapped to your user, you should now be able to ONLY access the new container in the External Storage Account.
In Azure, this can be accomplished by adding the same two roles (Storage Blob Data Owner and Storage Blob Delegator) you added to your RAZ Managed Identity for your Datalake Storage Account to an external Storage Account.
Here is what your RAZ Managed Identity looks like for your minimal setup for CDP with RAZ:
Just add these same two roles to another Storage Account to allow RAZ/your CDP Environment to interact with another Storage Account:
So that your Managed Identity Role Set looks like this:
We follow the same procedure as above, but just with a different scope (because we're integrating with a storage account in a different subscription.
We add Storage Blob Data Owner and Storage Blob Delegator on the Storage Account (in a different subscription)
So that the RAZ Managed Identity has this role set for the scope of our "other" subscription:
Based on Scenarios 3 and 4, we now can interact with a total of 3 storage accounts:
DISCLAIMER: This article is contributed by an external user. The steps may not be verified by Cloudera and may not be applicable for all use cases and may be very specific to a particular distribution. Please follow with caution and at your own risk. If needed, raise a support case to get confirmation.