Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Using Distcp on Encryption Zones

avatar
Rising Star

Hi,

In HDFS Admin Guide, for copying data across encryption zones (inter-cluster or intra-cluster) it has been recommended to use distcp on /.reserved/raw/source_data_dir instead of source_data_dir directly. I believe the reason behind this is to reduce the unnecessary decryption and encryption of the copied-over data on source and destination respectively.

My question is that if we copy from /.reserved/raw directory, the data on the destination would obviously be in encrypted form which means the KMS keys as well need to be copied over separately, like database dump or something like that? Any pointers on what is the best strategy in this case?

1 ACCEPTED SOLUTION

avatar

Take a look at the link below for a detailed explanation. In short though, yes, a database dump and load of the keys is necessary using the provided "exportKeysToJCEKS.sh" and "importKeysToJCEKS.sh" scripts.

https://community.hortonworks.com/articles/51909/how-to-copy-encrypted-data-between-two-hdp-cluster....

View solution in original post

2 REPLIES 2

avatar
Super Collaborator

The Ranger KMS has import/export scripts that you can use on both the source and target clusters.

So you can export the keys from the source cluster, copy them over to the target cluster, import them into the target KMS, create your encryption zones on the target using the imported keys, and use distcp as described in the guide.

avatar

Take a look at the link below for a detailed explanation. In short though, yes, a database dump and load of the keys is necessary using the provided "exportKeysToJCEKS.sh" and "importKeysToJCEKS.sh" scripts.

https://community.hortonworks.com/articles/51909/how-to-copy-encrypted-data-between-two-hdp-cluster....