About pvidal

pvidal · ‎01-13-2021

@joyabrata you have a few options: - you can study the documentation expectations for outbound dettails here https://docs.cloudera.com/management-console/cloud/proxy/topics/mc-whitelist-urls-environment.html - you can use automation scripts like: https://github.com/paulvid/cdp-one-click/ to set up everything end to end.

pvidal · ‎12-23-2020

In a previous article, I realized that I saved my flow as a flow file instead of a template, which may make it hard to import for some. So in this article, I will explain how to import a flow to NiFi registry in Datahub (knowing that NiFi registry is secure by default in CDP Datahub). Step 1: Download and configure stores Connect to one of the NiFi machines with the Cloudbreak user and the key you used at deployment: $ ssh -i [path_to_private_key] cloudbreak@[your_nifi_host] Next, copy and authorize the key and trust stores: $ sudo su $ cp /var/lib/cloudera-scm-agent/agent-cert/cm-auto-host_keystore.jks /tmp $ cp /var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks /tmp $ chmod a+rw /tmp/cm-auto-host_keystore.jks $ chmod a+rw /tmp/cm-auto-global_truststore.jks Step 2: Create a registry.properties file Go to /tmp (for instance) and create the following file - registry.properties file: baseUrl=https://fod-nifi-cluster-gateway0.fod-cdp.a465-9q4k.cloudera.site:18433 keystore=/tmp/cm-auto-host_keystore.jks keystoreType=JKS keystorePasswd=[YOUR_KEYSTORE_PWD] keyPasswd=[YOUR_KEYSTORE_PWD] truststore=/tmp/cm-auto-global_truststore.jks truststoreType=JKS truststorePasswd=[YOUR_TRUSTSTORE_PWD] proxiedEntity=[YOUR_USER_AUTHORIZED_IN_RANGER] Notes: I'm not going to expose the method to get your password for keystore and truststore It is important that you add a proxied entity with your workload user that is authorized in ranger to use the registry (find your user in the CDP management console) Next, download the flow you want to import, for instance: wget https://raw.githubusercontent.com/paulvid/datasets/master/hybrid-demo/nifi-flow/NiFi_Flow.json Step 3: Run the import Run these command lines: $ /opt/cloudera/parcels/CFM-2.0.6.0/TOOLKIT/bin/cli.sh Apache (_) .' ..](_) , _ .--. __ _| |_ __ )\ [ `.-. | [ |'-| |-'[ | / \ | | | | | | | | | | ' ' [___||__][___][___] [___]', ,' `' CLI v1.11.4.2.0.6.0-27 Type 'help' to see a list of available commands, use tab to auto-complete. Session loaded from /home/pvidal/.nifi-cli.config #> registry create-bucket -p registry.properties --bucketName hybrid-cloud dfc33699-0317-4893-82c4-8a12ad6ed822 #> registry create-flow -p registry.properties -b dfc33699-0317-4893-82c4-8a12ad6ed822 -fn hybridflow 8d2e7f87-f176-4f34-9788-72be034e4a3f #> registry import-flow-version -p registry.properties -f 8d2e7f87-f176-4f34-9788-72be034e4a3f -i NiFi_Flow.json 1 #> exit Step 4: Import the flow in the UI Navigate to the NiFi UI, and add a processor group. Then, click Import: Select the bucket and flow we just imported using CLI: After clicking Import, your flow is successfully available!

pvidal · ‎12-03-2020

Final article of the hybrid cloud series (see parent article here), and it's the funnest one! In this tutorial, we will learn to use Cloudera Viz to create visual apps. We will not go in depth into how to use viz, rather we will import an already existing app, executing on this part of the tutorial flow: Prerequisites Complete Part 3 of the tutorial series. Step 1: Import Viz App Navigate to your Management console > Data Warehouse > Open Cloudera Viz: You will be redirected to the default examples; Navigate to the DATA tab: Then, click on Default Hive VW (this is your CDW VW): Then, click on Import Visual Artifacts: Finally, import the viz app (link here) as follows: After clicking on Import, you will see a new app: Step 2: Update each maps with Mapbox token In order for the maps to display, you will have to add a mapbox token. I will only detail one of the three dashboard edit here, but you will need to do it for each dashboard. For this, first create an account at https://account.mapbox.com/auth/signup/ and copy your access token: Then, go to the visual app and click the Edit button: From there, click the dashboard to edit: Once clicked, click Edit: From there, click on the map edit gear > Settings > add your Mapbox token > Save: Step 3: Use App Once all the dashboards have been updated with the appropriate token, launch the app: You will then be able to have a daily updated visual of the Covid cases and the risk of all branches to re-open based on the spread of the virus:

pvidal · ‎12-02-2020

Welcome to Part 3 of my article series on how to harness the hybrid cloud series (see parent article here). In this tutorial you will learn to use NiFi to: Pull information from public APIs Push this raw data to secure S3 bucket using SDX integration Create Hive tables on top of this data by connecting to CDW This corresponds to step 3 of the series, as explained below: Note: The anonymized NiFi flow can be found here. Prerequisites Complete Part 2 of this article series A NiFi Datahub in the environment you created for Part 1 and Part 2 Step 1: Prepare your NiFi Datahub for JDBC connection to Hive in CDW Download driver jar Navigate to your Management console > Data Warehouse > find your virtual warehouse and download the driver jar: Upload jar to NiFi nodes Navigate to the Management console > your NiFi datahub > Hardware and note the public IPs of the NiFi nodes: Using these public IPs (here hidden), upload the downloaded jar to all three nodes (see example code below, using your workload user and password) $ scp hive-jdbc-3.1.0-SNAPSHOT-standalone.jar [YOUR_USER]@[NODE_1]:/tmp/ $ scp hive-jdbc-3.1.0-SNAPSHOT-standalone.jar [YOUR_USER]@[NODE_2]:/tmp/ $ scp hive-jdbc-3.1.0-SNAPSHOT-standalone.jar [YOUR_USER]@[NODE_3]:/tmp/ $ ssh [YOUR_USER]@[NODE_1] chmod a+rw /tmp/hive-jdbc-3.1.0-SNAPSHOT-standalone.jar $ ssh [YOUR_USER]@[NODE_2] chmod a+rw /tmp/hive-jdbc-3.1.0-SNAPSHOT-standalone.jar $ ssh [YOUR_USER]@[NODE_3] chmod a+rw /tmp/hive-jdbc-3.1.0-SNAPSHOT-standalone.jar Copy the JDBC URL Navigate to your Management console > Data Warehouse > find your virtual warehouse and copy the JDBC URL: You should get something along these lines: jdbc:hive2://[YOUR_CDW_HOST]/default;transportMode=http;httpPath=cliservice;ssl=true;retries=3 Step 2: Configure the NiFi flow This tutorial requires the creation of two NiFi flow (one to map zip codes to attitude and longitude, and one to get the latest covid cases numbers): In this tutorial, I will only detail the configuration of one of the two flows as they are almost identical except for file/table/field names. The full example code is in this gist. Here is the overview of the US Geocode flow: Get data from API to secure S3 using SDX For this part, we first use a simple configuration of an invoke http: The remote URL called is: https://data.healthcare.gov/api/views/52wv-g36k/rows.csv?accessType=DOWNLOAD&api_foundry=true Then, we use a replace attribute to replace the filename and make sure we override data: Finally, we use a put HDFS with the following parameters: Hadoop Configuration Resources: /etc/hadoop/conf.cloudera.core_settings/core-site.xml Kerberos Principal: [your workload user] Kerberos Password: [your workload password] Directory: s3a://[your env bucket]/raw/geocodes_by_county/ Conflict Resolution Strategy: replace Drop and create tables For both drop and create tables, we first use a Replace Text to send the query. For example: with replacement value of: drop TABLE if exists worldwidebank.us_geocodes_by_county; drop TABLE if exists worldwidebank.us_geocodes_by_county_ext; Then we use a puthive3QL with default parameters: The only thing needed to configure to make this work is the Hive3ConnectionPool, configured as follows: Database Connection URL: [your JDBC URL] Database User: [your workload user] Password: [your workload password] Step 3: Verify Data Creation After executing both flow, navigate to Hue from CDW and look at the data, as such:

pvidal · ‎11-13-2020

Welcome to Part 2 of our harness the hybrid cloud series. In this tutorial, we will learn how to use Data Catalog, Atlas, and Ranger to profile and protect sensitive data in CDP Public Cloud, as depicted below: Prerequisites Complete Part 1 of the series Step 1: Launch Data Profiling CDP Data Catalog comes with data profilers out of the box. You can of course customize them, but in our datasets, we will use the standard data profilers. Launch Profiler Cluster Navigate to your CDP Management Console > Data Catalog > Select your environment > Launch Profilers: This will launch a datahub cluster to run the data profiling spark jobs. Wait for the cluster to be built, like in the following screenshot: Verify Profiler execution Navigate back to your Data Catalog > Profilers > Select your env > Cluster Sensitivity Profiler, and verify that profilers have run successfully: Check profiled data Go to Search and find the employees Hive table: In the employees table, go to Schema and check the automated tags created: Step 2: Create Tag Based Policy Navigate to Ranger In Data Catalog, go to the Policy tab and navigate to a policy to open Ranger: In Ranger, go to Tag based Policies: Open the cm_tags service: Navigate to Masking to Add a new policy: Create Masking Rule Configure the masking rule as depicted in the following screenshot: Give it a name (for example, mask_creditcard) Select the dp_credicard tag (dp prefix standing for data profiler) Select the Group or user for which this policy should apply (here pvidal) Select Access Type: Hive, Select Select Masking Option: Redact Step 3: Verify Security Rule Go back to your management console Data Warehouse and open Hue for your virtual warehouse: Run the following query and observe masked results: select ccnumber from worldwidebank.employees As you observed, CDP makes it very easy to secure your data in the cloud. Next step, enrich this data with NiFi!

pvidal · ‎11-04-2020

As explained in this Parent article, here is the first step in our hybrid cloud implementation: replicating bank branches and employee data from an on-prem cluster to CDP Public Cloud. Prerequisites A CDP Base Cluster (with admin rights and user part of HDFS supergroup) A CDP Public Cloud environment (with admin rights) CDW virtual warehouse attached to Public Cloud environment Note: you can find the datasets used for this exercise here Step 1: Register CDP Base as classic cluster Start registration In your CDP Public Cloud management console, go to Classic Clusters > Add Cluster > CDH and enter your CDP Base cluster information: You will then see your cluster registration in progress: Install a secure communication tunnel Click on the Files button in Install files and follow the instructions: The following are some example instructions to do this on your CM node. Download from your management console the ssh_tunnel_setup_files.zip archive: Copy it to your CM node $ scp -i [your_key_location] ssh_tunnel_setup_files.zip [your_user_with_sudo_privileges]@[your_host]:/home/[your_user_with_sudo_privileges] SSH to CM node and instal ccm autossh $ ssh -i [your_key_location] [your_user_with_sudo_privileges]@[your_host] $ sudo su $ wget https://archive.cloudera.com/ccm/0.1/ccm-autossh-client-0.1-20191023211905gitd03880c.x86_64.rpm $ yum -y --nogpgcheck localinstall ccm-autossh-client-0.1-20191023211905gitd03880c.x86_64.rpm Install Tunnel $ unzip ssh_tunnel_setup_files.zip $ ./install.sh Post the installation you should see a message like this: ========================================================================================== SSH tunnel for CM established successfully. Run 'ccm-tunnel status' for status Run 'journalctl -f -u ccm-tunnel@CM.service' or 'journalctl -xe' for logs. ========================================================================================== Finish Registration In your Management console, click on Test Connection: Once the connection is successful, you can click on Register, add your CM user/pw and connect: Finally, enter the location of your base cluster (to display on dashboard map): You have now successfully established a secure tunnel between CDP Base and CDP Public Cloud: Step 2: Create a Replication Policy Navigate to Replication Manager > Classic Clusters > 3 dots on your cluster > Add Policy: In our case, we are going to replicate 2 datasets from HDFS: Employee data Bank location data In Step 1, give a policy name and select HDFS: In Step 2, add the location of your dataset and name of your superuser: In Step 3, select s3 add your AWS credentials: After validation, enter the target bucket (your environment cloud storage) and validate: For the next two steps, use default settings: After you click create, you will see the replication policy progressing. Wait for it to complete successfully then move on to the next step: Step 3: Create external and managed tables in CDW Navigate to CDW > 3 dots of your virtual warehouse > Open Hue: In your query editor, run the following queries (adapting to your s3 path of course): create database if not exists worldwidebank; use worldwidebank; CREATE EXTERNAL TABLE if not exists worldwidebank.employees_ext( number int, location int, gender string, title string, givenname string, middleinitial string, surname string, streetaddress string, city string, state string, statefull string, zipcode string, country string, countryfull string, emailaddress string, username string, password string, telephonenumber string, telephonecountrycode string, mothersmaiden string, birthday string, age int, tropicalzodiac string, cctype string, ccnumber string, cvv2 string, ccexpires string, ssn string, insuranceid string, salary string, bloodtype string, weight double, height int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION 's3a://pvi-e2e-cdp-bucket/vizbank/raw/employees/' tblproperties("skip.header.line.count"="1"); CREATE EXTERNAL TABLE if not exists worldwidebank.locations_ext( LOCATION_ID int, ADDRESS string, BKCLASS string, CBSA string, CBSA_DIV string, CBSA_DIV_FLG string, CBSA_DIV_NO string, CBSA_METRO string, CBSA_METRO_FLG string, CBSA_METRO_NAME string, CBSA_MICRO_FLG string, CBSA_NO string, CERT string, CITY string, COUNTY string, CSA string, CSA_FLG string, CSA_NO string, ESTYMD string, FI_UNINUM string, MAINOFF string, NAME string, OFFNAME string, OFFNUM string, RUNDATE string, SERVTYPE string, STALP string, STCNTY string, STNAME string, UNINUM string, ZIP int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION 's3a://pvi-e2e-cdp-bucket/vizbank/raw/locations/' tblproperties("skip.header.line.count"="1"); create table worldwidebank.employees as select * from worldwidebank.employees_ext; create table worldwidebank.locations as select * from worldwidebank.locations_ext; CREATE MATERIALIZED VIEW worldwidebank.employees_per_state as select locations.stname, count(*) as num_employees from employees, locations where employees.location=locations.location_id GROUP BY locations.stname; And that's it, you now have replicated data from your base cluster to CDP: The next step will be to profile sensitive data to protect our employees' data.

pvidal · ‎10-31-2020

I'm back with a new article series like I did previously with news author personality recognition, beast mode quotient, and AI to edge (though this one got recently replaced with identifying Magic: The Gathering cards). In this series, I will showcase how to harness the true power of Cloudera Data Platform (CDP) Hybrid cloud capabilities. Throughout the series you will learn how to use CDP Private Cloud Base, Replication Manager, CDP Public Cloud, Nifi, Kafka on data hub, Cloudera Data Warehouse, and Cloudera Viz. Reminder: CDP Vision CDP is designed to seamlessly enable you to deploy any data workloads (data collection, streaming, enrichment, engineering, serving, and AI/ML), on any infrastructure, with the latest engines while maintaining a coherent layer of security and governance (SDX). Case Study: Worldwide Bank For the purpose of this article, I will use an example of a fake bank (Worldwide Bank). Worldwide Bank is a large international bank that leverages a traditional big data architecture on-premises (CDP PvC Base) for data engineering and data warehousing over petabytes of data. With COVID-19 taking the world through unprecedented times, competition is at its highest, accelerating its data organization through their adoption of the latest technologies and architectures, especially cloud infrastructures. Their first use case on this new technology platform is to create a visual report assessing the risk of every one of its branches as the virus spreads. The implementation of this first use case has the following critical considerations: Speed of implementation/cloud adoption Maintenance of data privacy/security standards Re-use of current team skillset (i.e. portability) Implementation Architecture After carefully considering options, the bank selected CDP as their hybrid architecture as it satisfies all their needs. Specifically, here is their implementation design: This article series will guide you through these four steps: Replicate bank branches and employee data (Replication Manager, Cloudera Manager, S3, HDFS). Profile sensitive data and apply data protection (Data Catalog profilers, Atlas, Ranger). Enrich data by streaming COVID statistics (Nifi). Create interactive visual reports (Cloudera Data Warehouse, Hive LLAP, Viz). Note: all assets for this series can be found here.

pvidal · ‎10-27-2020

Note: This article is all thanks to Sumit Prakash for diving into the Ranger source code and explaining to me how to do this! When developing new plugins for Ranger, these plugins need to be able to download the corresponding Ranger policies. In a secure Ranger setup (which is the preferred way), you download them via the secure download API endpoint: /plugins/secure/policies/download/{serviceName} This API requires authentication. In this article, I will highlight how to set up a read-only user to download a policy. Step 1: Create a read-only user As a Ranger administrator, go to Settings > Users/Groups/Roles: Then, click on Add New User and create a user with the auditor role, as follows : Step 2: Give download access to this user In the Ranger home screen, edit the service you want to download (here Hadoop SQL, or it's technical name cm_hive): Add the recently created user to the policy.download.auth.users configuration: Step 3: Test the download API For this, you can run the following curl command: curl -Lku auditor:[password] -H "Accept: application/json" -H "Content-type:application/json" "https://[ranger_url]:6182/service/plugins/secure/policies/download/cm_hive" You should get a response that looks like the following: {"serviceName":"cm_hive","serviceId":5,"policyVersion":10,"policyUpdateTime":1603825207732,"policies":[{"id":7,"guid":"78892229-bea4-421f-85fd-8214e88e3c21","isEnabled":true,"version":1,"service":"cm_hive","name":"all - global","policyType":0,"policyPriority":0,"description":"Policy for all - global","isAuditEnabled":true,"resources":{"global":{"values":["*"],"isExcludes":false,"isRecursive":false}},"policyItems":[{"accesses":[{"type":"select","isAllowed":true},{"type":"update","isAllowed":true},{"type":"create","isAllowed":true},{"type":"drop","isAllowed":true},{"type":"alter","isAllowed":true},{"type":"index","isAllowed":true},{"type":"lock","isAllowed":true},{"type":"all","isAllowed":true},{"type":"read","isAllowed":true},{"type":"write","isAllowed":true},{"type":"repladmin","isAllowed":true},{"type":"serviceadmin","isAllowed":true},{"type":"tempudfadmin","isAllowed":true},{"type":"refresh","isAllowed":true}],"users":["hive","beacon","dpprofiler","hue","admin","impala"],"groups":[],"roles":[],"conditions":[],"delegateAdmin":true},{"accesses":[{"type":"read","isAllowed":true}],"users":["rangerlookup"],"groups":[],"roles":[],"conditions":[],"delegateAdmin":false}],"denyPolicyItems":[],"allowExceptions":[],"denyExceptions":[],"dataMaskPolicyItems":[],"rowFilterPolicyItems":[],"serviceType":"hive","options":{},"validitySchedules":[],"policyLabels":[],"zoneName":"","isDenyAllElse":false},{"id":8,"guid":"211690a6-6fb9-41e1-99ba-bb00a46adedb","isEnabled":true,"version":1,"service":"cm_hive","name":"all - database, table, column","policyType":0,"policyPriority":0,"description":"Policy for all - database, table, column","isAuditEnabled":true,"resources":{"database":{"values":["*"],"isExcludes":false,"isRecursive":false},"column":{"values":["*"],"isExcludes":false,"isRecursive":false},"table":{"values":["*"],"isExcludes":false,"isRecursive":false}},"policyItems":[{"accesses":[{"type":"select","isAllowed":true},{"type":"update","isAllowed":true},{"type":"create","isAllowed":true},{"type":"drop","isAllowed":true},{"type":"alter","isAllowed":true},{"type":"index","isAllowed":true},{"type":"lock","isAllowed":true},{"type":"all","isAllowed":true},{"type":"read","isAllowed":true},{"type":"write","isAllowed":true},{"type":"repladmin","isAllowed":true},{"type":"serviceadmin","isAllowed":true},{"type":"tempudfadmin","isAllowed":true},{"type":"refresh","isAllowed":true}],"users":["hive","beacon","dpprofiler","hue","admin","impala"],"groups":[],"roles":[],"conditions":[],"delegateAdmin":true},{"accesses":[{"type":"read","isAllowed":true}],"users":["rangerlookup"],"groups":[],"roles":[],"conditions":[],"delegateAdmin":false},{"accesses":[{"type":"all","isAllowed":true},{"type":"drop","isAllowed":true},{"type":"serviceadmin","isAllowed":true},{"type":"select","isAllowed":true},{"type":"read","isAllowed":true},{"type":"update","isAllowed":true},{"type":"create","isAllowed":true},{"type":"index","isAllowed":true},{"type":"lock","isAllowed":true},{"type":"refresh","isAllowed":true},{"type":"repladmin","isAllowed":true},{"type":"write","isAllowed":true},{"type":"alter","isAllowed":true}],"users":["{OWNER}"],"groups":[],"roles":[],"conditions":[],"delegateAdmin":true}],"denyPolicyItems":[],"allowExceptions":[],"denyExceptions":[],"dataMaskPolicyItems":[],"rowFilterPolicyItems":[],"serviceType":"hive","options":{},"validitySchedules":[],"policyLabels":[],"zoneName":"","isDenyAllElse":false},{"id":9,"guid":"3b6489dd-e76d-408f-bac0-c5cba4bdb2ac","isEnabled":true,"version":1,"service":"cm_hive","name":"all - database, table","policyType":0,"policyPriority":0,"description":"Policy for all - database, table","isAuditEnabled":true,"resources":{"database":{"values":["*"],"isExcludes":false,"isRecursive":false},"table":{"values":["*"],"isExcludes":false,"isRecursive":false}},"policyItems":[{"accesses":[{"type":"select","isAllowed":true},{"type":"update","isAllowed":true},{"type":"create","isAllowed":true},{"type":"drop","isAllowed":true},{"type":"alter","isAllowed":true},{"type":"index","isAllowed":true},{"type":"lock","isAllowed":true},{"type":"all","isAllowed":true},{"type":"read","isAllowed":true},{"type":"write","isAllowed":true},{"type":"repladmin","isAllowed":true},{"type":"serviceadmin","isAllowed":true},{"type":"tempudfadmin","isAllowed":true},{"type":"refresh","isAllowed":true}],"users":["hive","beacon","dpprofiler","hue","admin","impala"],"groups":[],"roles":[],"conditions":[],"delegateAdmin":true},{"accesses":[{"type":"read","isAllowed":true}],"users":["rang [...]

pvidal · ‎10-20-2020

Cloudera Data Platform Public Cloud recently introduced the ability to backup and restore datalake from a saved location. Specifically, the backup operation saves a full snapshot of data from all SDX services: Atlas: Audit events, saved in HBase tables Lineage data, saved as Janus graph data in HBase tables Edge, vertex, and full text indexes, saved in Solr collections Ranger: Audit logs, saved as a Solr collection Permissions and tags, saved in RDBMS tables HMS Metadata, saved in RDBMS tables In this article, I will detail how to run backup and restore in CDP Public Cloud in AWS, via the CDP CLI. Pre-Requisites Stop operations that could affect backup Make sure that no HMS affecting operations are running (e.g. creating a table from CDW or a datahub) Go to your Datalake Cloudera Manager, and shut down: Atlas Ranger HMS Make sure you have the proper IAM permissions Datalake backup uses both the Ranger Audit Role and Datalake Admin Roles to write the backups (more details on these roles here) Therefore, the policies attached to both the IAM role must give write permissions to the location of your backup. Here is an example of a policy attached to the Ranger Audit Role: { "Version": "2012-10-17", "Statement": [ { "Sid": "FullObjectAccessUnderAuditDir", "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject" ], "Resource": "arn:aws:s3:::bckp-cdp-bucket/ranger/audit/*" }, { "Sid": "FullObjectAccessUnderBackupDir", "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject" ], "Resource": "arn:aws:s3:::bckp-cdp-bucket/backups/*" }, { "Sid": "LimitedAccessToDataLakeBucket", "Effect": "Allow", "Action": [ "s3:AbortMultipartUpload", "s3:ListBucket", "s3:ListBucketMultipartUploads" ], "Resource": "arn:aws:s3:::bckp-cdp-bucket" } ] } Install and configure CDP CLI This is fairly straightforward, and documented in your management console, under Help > Download CLI: Step 1: Running back-up Initiate backup $ cdp datalake backup-datalake --datalake-name bckp-cdp-dl --backup-location s3a://bckp-cdp-bucket/backups/ { "accountId": "558bc1d2-8867-4357-8524-311d51259233", "backupId": "6c59a259-51ac-4db4-80d6-22f71f84cc4e", "internalState": "{ATLAS_ENTITY_AUDIT_EVENTS_TABLE=IN_PROGRESS, EDGE_INDEX_COLLECTION=IN_PROGRESS, DATABASE=IN_PROGRESS, FULLTEXT_INDEX_COLLECTION=IN_PROGRESS, ATLAS_JANUS_TABLE=IN_PROGRESS, RANGER_AUDITS_COLLECTION=IN_PROGRESS, VERTEX_INDEX_COLLECITON=IN_PROGRESS}", "status": "IN_PROGRESS", "startTime": "2020-10-20 21:11:27.821", "endTime": "", "backupLocation": "s3a://bckp-cdp-bucket/backups/", "failureReason": "null" } Monitor backup $ cdp datalake backup-datalake-status --datalake-name bckp-cdp-dl { "accountId": "558bc1d2-8867-4357-8524-311d51259233", "backupId": "6c59a259-51ac-4db4-80d6-22f71f84cc4e", "userCrn": "crn:altus:iam:us-west-1:558bc1d2-8867-4357-8524-311d51259233:user:86c4e7d9-1560-4afa-ac14-794bdeec0896", "internalState": "{ATLAS_ENTITY_AUDIT_EVENTS_TABLE=IN_PROGRESS, EDGE_INDEX_COLLECTION=IN_PROGRESS, DATABASE=IN_PROGRESS, FULLTEXT_INDEX_COLLECTION=IN_PROGRESS, ATLAS_JANUS_TABLE=IN_PROGRESS, RANGER_AUDITS_COLLECTION=IN_PROGRESS, VERTEX_INDEX_COLLECITON=IN_PROGRESS}", "status": "IN_PROGRESS", "startTime": "2020-10-20 21:11:27.821", "endTime": "", "backupLocation": "s3a://bckp-cdp-bucket/backups/", "backupName": "", "failureReason": "null" } Step 2: Restoring backup Initiate restore $ cdp datalake restore-datalake --datalake-name bckp-cdp-dl --backup-id 6c59a259-51ac-4db4-80d6-22f71f84cc4e { "accountId": "558bc1d2-8867-4357-8524-311d51259233", "restoreId": "06c0bde4-cfc7-4b9e-a8e0-d9f2ddfcb5c5", "backupId": "6c59a259-51ac-4db4-80d6-22f71f84cc4e", "internalState": "{ATLAS_ENTITY_AUDIT_EVENTS_TABLE=IN_PROGRESS, DATABASE=IN_PROGRESS, EDGE_INDEX_COLLECTION_DELETE=IN_PROGRESS, RANGER_AUDITS_COLLECTION_DELETE=IN_PROGRESS, VERTEX_INDEX_COLLECITON_DELETE=IN_PROGRESS, ATLAS_JANUS_TABLE=IN_PROGRESS, FULLTEXT_INDEX_COLLECTION_DELETE=IN_PROGRESS}", "status": "IN_PROGRESS", "startTime": "2020-10-20 21:15:11.757", "endTime": "", "backupLocation": "s3a://bckp-cdp-bucket/backups/", "failureReason": "null" } Monitor restore $ cdp datalake restore-datalake-status --datalake-name bckp-cdp-dl { "accountId": "558bc1d2-8867-4357-8524-311d51259233", "restoreId": "06c0bde4-cfc7-4b9e-a8e0-d9f2ddfcb5c5", "backupId": "6c59a259-51ac-4db4-80d6-22f71f84cc4e", "userCrn": "crn:altus:iam:us-west-1:558bc1d2-8867-4357-8524-311d51259233:user:86c4e7d9-1560-4afa-ac14-794bdeec0896", "internalState": "{ATLAS_ENTITY_AUDIT_EVENTS_TABLE=IN_PROGRESS, EDGE_INDEX_COLLECTION=SUCCESSFUL, DATABASE=SUCCESSFUL, FULLTEXT_INDEX_COLLECTION=SUCCESSFUL, EDGE_INDEX_COLLECTION_DELETE=SUCCESSFUL, VERTEX_INDEX_COLLECITON_DELETE=SUCCESSFUL, RANGER_AUDITS_COLLECTION_DELETE=SUCCESSFUL, ATLAS_JANUS_TABLE=IN_PROGRESS, RANGER_AUDITS_COLLECTION=IN_PROGRESS, VERTEX_INDEX_COLLECITON=IN_PROGRESS, FULLTEXT_INDEX_COLLECTION_DELETE=SUCCESSFUL}", "status": "IN_PROGRESS", "startTime": "2020-10-20 21:15:11.757", "endTime": "", "backupLocation": "s3a://bckp-cdp-bucket/backups/", "failureReason": "null" } Note: you can also monitor these events in the CDP Control Plane:

pvidal · ‎10-14-2020

This looks like a permission issue on your Azure subscription at the time of creation. Did you manage to get involved with our partner program? We would like to help you out by sharing screen rather than finding the needle in the haystack here 🙂

Online	Offline
Last Visited	‎01-13-2021 10:35 AM

Member Since	‎07-10-2018 02:51 PM
Last Visited	‎01-13-2021 10:35 AM
Posts	63
Kudos received	73

Cloudera Community

Re: Automating Cluster Connectivity Manager (CCM) ...

How to import a flow to NiFi registry in CDP Cloud

How to use Cloudera Viz to create interactive visu...

How to use NiFi to write API data to CDP CDW

How to use CDP SDX to profile and protect sensitiv...

How to use CDP Replication Manager to replicate da...

Harnessing the hybrid Cloud: use CDP Cloud to eval...

How to download Ranger policies with a read-only u...

CDP Public Cloud: Datalake backup and restore

Re: CDP on Azure: Creation failed (freelpa creatio...