Created 10-25-2016 11:28 AM
Hi,
In HDFS Admin Guide, for copying data across encryption zones (inter-cluster or intra-cluster) it has been recommended to use distcp on /.reserved/raw/source_data_dir instead of source_data_dir directly. I believe the reason behind this is to reduce the unnecessary decryption and encryption of the copied-over data on source and destination respectively.
My question is that if we copy from /.reserved/raw directory, the data on the destination would obviously be in encrypted form which means the KMS keys as well need to be copied over separately, like database dump or something like that? Any pointers on what is the best strategy in this case?
Created 10-25-2016 05:18 PM
Take a look at the link below for a detailed explanation. In short though, yes, a database dump and load of the keys is necessary using the provided "exportKeysToJCEKS.sh" and "importKeysToJCEKS.sh" scripts.
Created 10-25-2016 02:06 PM
The Ranger KMS has import/export scripts that you can use on both the source and target clusters.
So you can export the keys from the source cluster, copy them over to the target cluster, import them into the target KMS, create your encryption zones on the target using the imported keys, and use distcp as described in the guide.
Created 10-25-2016 05:18 PM
Take a look at the link below for a detailed explanation. In short though, yes, a database dump and load of the keys is necessary using the provided "exportKeysToJCEKS.sh" and "importKeysToJCEKS.sh" scripts.