Created on 03-08-2021 11:53 PM - edited 09-16-2022 08:54 AM
Hi all,
I'm having trouble during the provisioning of an environment via cloudera manager cloud console.
I followed the quick start, https://docs.cloudera.com/management-console/cloud/azure-quickstart/topics/mc-azure-quickstart.html and the guide on the repository https://github.com/cpv0310/cdp-azure-tools, but the problem remains the same:hdfs can't write to storage abfs: // data @ xxx
I tried to create the managed identity both through the template and through the script provided, but I have not had any changes.
The only different thing is that in the guide, step 6, it says to assign both assumer identity and data identity, but in the form I only have the possibility to assign the assumer identity. Same thing when I go to assign the identity logger: I only have one slot and I can't assign the identity ranger.
In the logs I see that the creation of the data lake stops trying to create the first folder on HDFS (abfs) and the error is on the "slave" node which through knox has a 403 forbidden. As soon as possible I attach the logs.
Thanks in advance
Created 03-10-2021 08:35 AM
Ok, I solved it using the cdp cli.
The problem was that from web ui of Cloudera Management Console it is not possible to insert the identity for ranger, while from cli it is possible.
Below are the scripts for creating the data lake environment:
cdp environments create-azure-environment \
--environment-name <ENV_NAME> \
--credential-name <CREDENTIAL_NAME> \
--region "AZURE_REGIONE_NAME" \
--security-access cidr=0.0.0.0/0 \
--no-enable-tunnel \
--public-key "ssh-rsa ..." \
--log-storage storageLocationBase=abfs://logs@<STORAGE_ACCOUNT_NAME>.dfs.core.windows.net,managedIdentity=/subscriptions/xxx/resourcegroups/<RG_NAME>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<ENV_NAME>-LoggerIdentity \
--use-public-ip \
--existing-network-params networkId=<ENV_NAME>-Vnet,resourceGroupName=<ENV_NAME>,subnetIds=CDP \
--free-ipa instanceCountByGroup=1
cdp environments set-id-broker-mappings \
--environment-name <ENV_NAME> \
--data-access-role /subscriptions/xxx/resourceGroups/<RG_NAME>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<ENV_NAME>-DataAccessIdentity \
--ranger-audit-role /subscriptions/xxx/resourceGroups/<RG_NAME>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<ENV_NAME>-RangerIdentity \
--set-empty-mappings
cdp datalake create-azure-datalake \
--datalake-name <ENV_NAME> \
--environment-name <ENV_NAME> \
--cloud-provider-configuration managedIdentity=/subscriptions/xxx/resourcegroups/<RG_NAME>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<ENV_NAME>-AssumerIdentity,storageLocation=abfs://data@<STORAGE_ACCOUNT_NAME>.dfs.core.windows.net \
--scale LIGHT_DUTY \
--runtime 7.2.7
Here instead the dockerfile for those wishing to have the cdp-cli in cointainer:
FROM python
RUN apt update \
&& apt upgrade -y \
&& apt install -y \
groff \
less
RUN git clone https://github.com/cloudera/cdpcli.git \
&& cd cdpcli \
&& pip install .
Created 03-09-2021 12:03 AM
Created 03-10-2021 03:10 AM
The CDP platform is great if your use cases require it.
I am noticing the issue, however, in CDP Public cloud implementation.
Have you tried it?
Created 03-10-2021 08:35 AM
Ok, I solved it using the cdp cli.
The problem was that from web ui of Cloudera Management Console it is not possible to insert the identity for ranger, while from cli it is possible.
Below are the scripts for creating the data lake environment:
cdp environments create-azure-environment \
--environment-name <ENV_NAME> \
--credential-name <CREDENTIAL_NAME> \
--region "AZURE_REGIONE_NAME" \
--security-access cidr=0.0.0.0/0 \
--no-enable-tunnel \
--public-key "ssh-rsa ..." \
--log-storage storageLocationBase=abfs://logs@<STORAGE_ACCOUNT_NAME>.dfs.core.windows.net,managedIdentity=/subscriptions/xxx/resourcegroups/<RG_NAME>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<ENV_NAME>-LoggerIdentity \
--use-public-ip \
--existing-network-params networkId=<ENV_NAME>-Vnet,resourceGroupName=<ENV_NAME>,subnetIds=CDP \
--free-ipa instanceCountByGroup=1
cdp environments set-id-broker-mappings \
--environment-name <ENV_NAME> \
--data-access-role /subscriptions/xxx/resourceGroups/<RG_NAME>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<ENV_NAME>-DataAccessIdentity \
--ranger-audit-role /subscriptions/xxx/resourceGroups/<RG_NAME>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<ENV_NAME>-RangerIdentity \
--set-empty-mappings
cdp datalake create-azure-datalake \
--datalake-name <ENV_NAME> \
--environment-name <ENV_NAME> \
--cloud-provider-configuration managedIdentity=/subscriptions/xxx/resourcegroups/<RG_NAME>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<ENV_NAME>-AssumerIdentity,storageLocation=abfs://data@<STORAGE_ACCOUNT_NAME>.dfs.core.windows.net \
--scale LIGHT_DUTY \
--runtime 7.2.7
Here instead the dockerfile for those wishing to have the cdp-cli in cointainer:
FROM python
RUN apt update \
&& apt upgrade -y \
&& apt install -y \
groff \
less
RUN git clone https://github.com/cloudera/cdpcli.git \
&& cd cdpcli \
&& pip install .