Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Distcp for classified/health-care data

avatar
Expert Contributor

Hi Folks,

I have a nightly job to copy data from Cluster-1 to Cluster-2 using DistCp. Now the issue comes with secured, classified data which is stored on the Source Cluster-1 using TDE and various other techniques. Was referring to the documentation of distCp and looks like it puts the data first on the /tmp wanted to know where does it create this /tmp directory?

on Source Cluster HDFS <root>/tmp OR

<HDFS_ROOT>/<Very_secured_Data_Dir>/tmp ?

Thanks,

SS

1 ACCEPTED SOLUTION

avatar

@Smart Solutions

Not sure of the answer to that, but if you're concerned about tmp data being unencrypted/intercepted then you may consider copying it over in it's unencrypted form. This will also reduce the encryption/re-encryption overhead. The link below talks about the different options to do this.

https://community.hortonworks.com/articles/51909/how-to-copy-encrypted-data-between-two-hdp-cluster....

View solution in original post

2 REPLIES 2

avatar
Expert Contributor

avatar

@Smart Solutions

Not sure of the answer to that, but if you're concerned about tmp data being unencrypted/intercepted then you may consider copying it over in it's unencrypted form. This will also reduce the encryption/re-encryption overhead. The link below talks about the different options to do this.

https://community.hortonworks.com/articles/51909/how-to-copy-encrypted-data-between-two-hdp-cluster....