Support Questions

Find answers, ask questions, and share your expertise

Using Distcp on Encryption Zones

avatar
Rising Star

Hi,

In HDFS Admin Guide, for copying data across encryption zones (inter-cluster or intra-cluster) it has been recommended to use distcp on /.reserved/raw/source_data_dir instead of source_data_dir directly. I believe the reason behind this is to reduce the unnecessary decryption and encryption of the copied-over data on source and destination respectively.

My question is that if we copy from /.reserved/raw directory, the data on the destination would obviously be in encrypted form which means the KMS keys as well need to be copied over separately, like database dump or something like that? Any pointers on what is the best strategy in this case?

1 ACCEPTED SOLUTION

avatar

Take a look at the link below for a detailed explanation. In short though, yes, a database dump and load of the keys is necessary using the provided "exportKeysToJCEKS.sh" and "importKeysToJCEKS.sh" scripts.

https://community.hortonworks.com/articles/51909/how-to-copy-encrypted-data-between-two-hdp-cluster....

View solution in original post

2 REPLIES 2

avatar
Super Collaborator

The Ranger KMS has import/export scripts that you can use on both the source and target clusters.

So you can export the keys from the source cluster, copy them over to the target cluster, import them into the target KMS, create your encryption zones on the target using the imported keys, and use distcp as described in the guide.

avatar

Take a look at the link below for a detailed explanation. In short though, yes, a database dump and load of the keys is necessary using the provided "exportKeysToJCEKS.sh" and "importKeysToJCEKS.sh" scripts.

https://community.hortonworks.com/articles/51909/how-to-copy-encrypted-data-between-two-hdp-cluster....