- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Cloudera management cloud console
Created on ‎03-08-2021 11:53 PM - edited ‎09-16-2022 08:54 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I'm having trouble during the provisioning of an environment via cloudera manager cloud console.
I followed the quick start, https://docs.cloudera.com/management-console/cloud/azure-quickstart/topics/mc-azure-quickstart.html and the guide on the repository https://github.com/cpv0310/cdp-azure-tools, but the problem remains the same:hdfs can't write to storage abfs: // data @ xxx
I tried to create the managed identity both through the template and through the script provided, but I have not had any changes.
The only different thing is that in the guide, step 6, it says to assign both assumer identity and data identity, but in the form I only have the possibility to assign the assumer identity. Same thing when I go to assign the identity logger: I only have one slot and I can't assign the identity ranger.
In the logs I see that the creation of the data lake stops trying to create the first folder on HDFS (abfs) and the error is on the "slave" node which through knox has a 403 forbidden. As soon as possible I attach the logs.
Thanks in advance
Created ‎03-10-2021 08:35 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok, I solved it using the cdp cli.
The problem was that from web ui of Cloudera Management Console it is not possible to insert the identity for ranger, while from cli it is possible.
Below are the scripts for creating the data lake environment:
cdp environments create-azure-environment \
--environment-name <ENV_NAME> \
--credential-name <CREDENTIAL_NAME> \
--region "AZURE_REGIONE_NAME" \
--security-access cidr=0.0.0.0/0 \
--no-enable-tunnel \
--public-key "ssh-rsa ..." \
--log-storage storageLocationBase=abfs://logs@<STORAGE_ACCOUNT_NAME>.dfs.core.windows.net,managedIdentity=/subscriptions/xxx/resourcegroups/<RG_NAME>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<ENV_NAME>-LoggerIdentity \
--use-public-ip \
--existing-network-params networkId=<ENV_NAME>-Vnet,resourceGroupName=<ENV_NAME>,subnetIds=CDP \
--free-ipa instanceCountByGroup=1
cdp environments set-id-broker-mappings \
--environment-name <ENV_NAME> \
--data-access-role /subscriptions/xxx/resourceGroups/<RG_NAME>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<ENV_NAME>-DataAccessIdentity \
--ranger-audit-role /subscriptions/xxx/resourceGroups/<RG_NAME>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<ENV_NAME>-RangerIdentity \
--set-empty-mappings
cdp datalake create-azure-datalake \
--datalake-name <ENV_NAME> \
--environment-name <ENV_NAME> \
--cloud-provider-configuration managedIdentity=/subscriptions/xxx/resourcegroups/<RG_NAME>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<ENV_NAME>-AssumerIdentity,storageLocation=abfs://data@<STORAGE_ACCOUNT_NAME>.dfs.core.windows.net \
--scale LIGHT_DUTY \
--runtime 7.2.7
Here instead the dockerfile for those wishing to have the cdp-cli in cointainer:
FROM python
RUN apt update \
&& apt upgrade -y \
&& apt install -y \
groff \
less
RUN git clone https://github.com/cloudera/cdpcli.git \
&& cd cdpcli \
&& pip install .
Created ‎03-09-2021 12:03 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎03-10-2021 03:10 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The CDP platform is great if your use cases require it.
I am noticing the issue, however, in CDP Public cloud implementation.
Have you tried it?
Created ‎03-10-2021 08:35 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok, I solved it using the cdp cli.
The problem was that from web ui of Cloudera Management Console it is not possible to insert the identity for ranger, while from cli it is possible.
Below are the scripts for creating the data lake environment:
cdp environments create-azure-environment \
--environment-name <ENV_NAME> \
--credential-name <CREDENTIAL_NAME> \
--region "AZURE_REGIONE_NAME" \
--security-access cidr=0.0.0.0/0 \
--no-enable-tunnel \
--public-key "ssh-rsa ..." \
--log-storage storageLocationBase=abfs://logs@<STORAGE_ACCOUNT_NAME>.dfs.core.windows.net,managedIdentity=/subscriptions/xxx/resourcegroups/<RG_NAME>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<ENV_NAME>-LoggerIdentity \
--use-public-ip \
--existing-network-params networkId=<ENV_NAME>-Vnet,resourceGroupName=<ENV_NAME>,subnetIds=CDP \
--free-ipa instanceCountByGroup=1
cdp environments set-id-broker-mappings \
--environment-name <ENV_NAME> \
--data-access-role /subscriptions/xxx/resourceGroups/<RG_NAME>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<ENV_NAME>-DataAccessIdentity \
--ranger-audit-role /subscriptions/xxx/resourceGroups/<RG_NAME>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<ENV_NAME>-RangerIdentity \
--set-empty-mappings
cdp datalake create-azure-datalake \
--datalake-name <ENV_NAME> \
--environment-name <ENV_NAME> \
--cloud-provider-configuration managedIdentity=/subscriptions/xxx/resourcegroups/<RG_NAME>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<ENV_NAME>-AssumerIdentity,storageLocation=abfs://data@<STORAGE_ACCOUNT_NAME>.dfs.core.windows.net \
--scale LIGHT_DUTY \
--runtime 7.2.7
Here instead the dockerfile for those wishing to have the cdp-cli in cointainer:
FROM python
RUN apt update \
&& apt upgrade -y \
&& apt install -y \
groff \
less
RUN git clone https://github.com/cloudera/cdpcli.git \
&& cd cdpcli \
&& pip install .
