Support Questions

Find answers, ask questions, and share your expertise

Distcp for classified/health-care data

avatar
Expert Contributor

Hi Folks,

I have a nightly job to copy data from Cluster-1 to Cluster-2 using DistCp. Now the issue comes with secured, classified data which is stored on the Source Cluster-1 using TDE and various other techniques. Was referring to the documentation of distCp and looks like it puts the data first on the /tmp wanted to know where does it create this /tmp directory?

on Source Cluster HDFS <root>/tmp OR

<HDFS_ROOT>/<Very_secured_Data_Dir>/tmp ?

Thanks,

SS

1 ACCEPTED SOLUTION

avatar

@Smart Solutions

Not sure of the answer to that, but if you're concerned about tmp data being unencrypted/intercepted then you may consider copying it over in it's unencrypted form. This will also reduce the encryption/re-encryption overhead. The link below talks about the different options to do this.

https://community.hortonworks.com/articles/51909/how-to-copy-encrypted-data-between-two-hdp-cluster....

View solution in original post

2 REPLIES 2

avatar
Expert Contributor

avatar

@Smart Solutions

Not sure of the answer to that, but if you're concerned about tmp data being unencrypted/intercepted then you may consider copying it over in it's unencrypted form. This will also reduce the encryption/re-encryption overhead. The link below talks about the different options to do this.

https://community.hortonworks.com/articles/51909/how-to-copy-encrypted-data-between-two-hdp-cluster....