Created on 10-20-2020 02:20 PM - edited on 10-23-2020 01:18 AM by VidyaSargur
Cloudera Data Platform Public Cloud recently introduced the ability to backup and restore datalake from a saved location. Specifically, the backup operation saves a full snapshot of data from all SDX services:
In this article, I will detail how to run backup and restore in CDP Public Cloud in AWS, via the CDP CLI.
Make sure that no HMS affecting operations are running (e.g. creating a table from CDW or a datahub)
Go to your Datalake Cloudera Manager, and shut down:
Datalake backup uses both the Ranger Audit Role and Datalake Admin Roles to write the backups (more details on these roles here)
Therefore, the policies attached to both the IAM role must give write permissions to the location of your backup.
Here is an example of a policy attached to the Ranger Audit Role:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "FullObjectAccessUnderAuditDir",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::bckp-cdp-bucket/ranger/audit/*"
},
{
"Sid": "FullObjectAccessUnderBackupDir",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::bckp-cdp-bucket/backups/*"
},
{
"Sid": "LimitedAccessToDataLakeBucket",
"Effect": "Allow",
"Action": [
"s3:AbortMultipartUpload",
"s3:ListBucket",
"s3:ListBucketMultipartUploads"
],
"Resource": "arn:aws:s3:::bckp-cdp-bucket"
}
]
}
This is fairly straightforward, and documented in your management console, under Help > Download CLI:
$ cdp datalake backup-datalake --datalake-name bckp-cdp-dl --backup-location s3a://bckp-cdp-bucket/backups/
{
"accountId": "558bc1d2-8867-4357-8524-311d51259233",
"backupId": "6c59a259-51ac-4db4-80d6-22f71f84cc4e",
"internalState": "{ATLAS_ENTITY_AUDIT_EVENTS_TABLE=IN_PROGRESS, EDGE_INDEX_COLLECTION=IN_PROGRESS, DATABASE=IN_PROGRESS, FULLTEXT_INDEX_COLLECTION=IN_PROGRESS, ATLAS_JANUS_TABLE=IN_PROGRESS, RANGER_AUDITS_COLLECTION=IN_PROGRESS, VERTEX_INDEX_COLLECITON=IN_PROGRESS}",
"status": "IN_PROGRESS",
"startTime": "2020-10-20 21:11:27.821",
"endTime": "",
"backupLocation": "s3a://bckp-cdp-bucket/backups/",
"failureReason": "null"
}
$ cdp datalake backup-datalake-status --datalake-name bckp-cdp-dl
{
"accountId": "558bc1d2-8867-4357-8524-311d51259233",
"backupId": "6c59a259-51ac-4db4-80d6-22f71f84cc4e",
"userCrn": "crn:altus:iam:us-west-1:558bc1d2-8867-4357-8524-311d51259233:user:86c4e7d9-1560-4afa-ac14-794bdeec0896",
"internalState": "{ATLAS_ENTITY_AUDIT_EVENTS_TABLE=IN_PROGRESS, EDGE_INDEX_COLLECTION=IN_PROGRESS, DATABASE=IN_PROGRESS, FULLTEXT_INDEX_COLLECTION=IN_PROGRESS, ATLAS_JANUS_TABLE=IN_PROGRESS, RANGER_AUDITS_COLLECTION=IN_PROGRESS, VERTEX_INDEX_COLLECITON=IN_PROGRESS}",
"status": "IN_PROGRESS",
"startTime": "2020-10-20 21:11:27.821",
"endTime": "",
"backupLocation": "s3a://bckp-cdp-bucket/backups/",
"backupName": "",
"failureReason": "null"
}
$ cdp datalake restore-datalake --datalake-name bckp-cdp-dl --backup-id 6c59a259-51ac-4db4-80d6-22f71f84cc4e
{
"accountId": "558bc1d2-8867-4357-8524-311d51259233",
"restoreId": "06c0bde4-cfc7-4b9e-a8e0-d9f2ddfcb5c5",
"backupId": "6c59a259-51ac-4db4-80d6-22f71f84cc4e",
"internalState": "{ATLAS_ENTITY_AUDIT_EVENTS_TABLE=IN_PROGRESS, DATABASE=IN_PROGRESS, EDGE_INDEX_COLLECTION_DELETE=IN_PROGRESS, RANGER_AUDITS_COLLECTION_DELETE=IN_PROGRESS, VERTEX_INDEX_COLLECITON_DELETE=IN_PROGRESS, ATLAS_JANUS_TABLE=IN_PROGRESS, FULLTEXT_INDEX_COLLECTION_DELETE=IN_PROGRESS}",
"status": "IN_PROGRESS",
"startTime": "2020-10-20 21:15:11.757",
"endTime": "",
"backupLocation": "s3a://bckp-cdp-bucket/backups/",
"failureReason": "null"
}
$ cdp datalake restore-datalake-status --datalake-name bckp-cdp-dl
{
"accountId": "558bc1d2-8867-4357-8524-311d51259233",
"restoreId": "06c0bde4-cfc7-4b9e-a8e0-d9f2ddfcb5c5",
"backupId": "6c59a259-51ac-4db4-80d6-22f71f84cc4e",
"userCrn": "crn:altus:iam:us-west-1:558bc1d2-8867-4357-8524-311d51259233:user:86c4e7d9-1560-4afa-ac14-794bdeec0896",
"internalState": "{ATLAS_ENTITY_AUDIT_EVENTS_TABLE=IN_PROGRESS, EDGE_INDEX_COLLECTION=SUCCESSFUL, DATABASE=SUCCESSFUL, FULLTEXT_INDEX_COLLECTION=SUCCESSFUL, EDGE_INDEX_COLLECTION_DELETE=SUCCESSFUL, VERTEX_INDEX_COLLECITON_DELETE=SUCCESSFUL, RANGER_AUDITS_COLLECTION_DELETE=SUCCESSFUL, ATLAS_JANUS_TABLE=IN_PROGRESS, RANGER_AUDITS_COLLECTION=IN_PROGRESS, VERTEX_INDEX_COLLECITON=IN_PROGRESS, FULLTEXT_INDEX_COLLECTION_DELETE=SUCCESSFUL}",
"status": "IN_PROGRESS",
"startTime": "2020-10-20 21:15:11.757",
"endTime": "",
"backupLocation": "s3a://bckp-cdp-bucket/backups/",
"failureReason": "null"
}
Note: you can also monitor these events in the CDP Control Plane: