Support Questions

Find answers, ask questions, and share your expertise

Distcp data backup to AWS S3 with AWS KMS encryption

avatar
New Contributor

We are currently trying to backup data from CDH cluster to S3 for backup and it works fine.
However when we want to use AWS KMS encryption to encrypt data at AWS side.

Typically this should be switch to encrypt with codes like below,


hadoop distcp \
-Dfs.s3a.access.key=<Access Key> \
-Dfs.s3a.secret.key=<Secret Key> \
-Dfs.s3a.server-side-encryption-algorithm=aws:kms \
-Dfs.s3a.server-side-encryption-key=<encryption-key> \
-Dcom.amazonaws.services.s3.disablePutObjectMD5Validation=true \
hdfs://<name-node>:8020/tmp/ \
s3a://<bucket-name>/temp1/

 

However I keep getting error related to hash code mismatch.

 

Anybody has any luck in this please?

2 REPLIES 2

avatar
New Contributor

Hi Ankush,

 

have you got any solution for the same. I am looking for similar case where i want to migrate the data from Hadoop to AWS S3 using s3-dist-cp with AWS KMS keys.

 

Please let me know if you have any solution for this.

 

Thanks in Advance.

Krishna

avatar
Expert Contributor

Looks like this is resloved in hadoop 2.8.0 ( not sure though) 

check this ==.> https://github.com/minio/minio/issues/2965

 

only workaround i found is first load data without  encryption and then enable encryption  on file copied in S3 (manually) . 

 

btw i have a general question . this SSE is to portect data in S3 only ,what about if someone with aws admin role download data to local disk ,its not more encrypted data