Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

distcp data from one secure zone to another in two diff cluster

distcp data from one secure zone to another in two diff cluster

Guru

distcp data from one secure zone to another in two diff cluster is failing with following error.

at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)

Caused by: java.io.IOException: Couldn't run retriable-command: Copying hdfs://nn1//data/raw//cust_sec_vm/000017_0 to hdfs://nn2:8020/user/000017_0

at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)

at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:283)

... 10 more

Caused by: java.io.IOException: Check-sum mismatch between hdfs://nn1//data/raw//cust_sec_vm/000017_0 to hdfs://nn2:8020/user/000017_0

4 REPLIES 4

Re: distcp data from one secure zone to another in two diff cluster

Cloudera Employee

You can enable preserve block and check-sum in the distcp copying using -pbc.

hadoop distcp -pbc <SRC> <DEST>

Is there a version mismatch between the SRC and DEST?

Highlighted

Re: distcp data from one secure zone to another in two diff cluster

New Contributor

hadoop distcp -pbc <SRC><DEST> does not work in the given case.

Re: distcp data from one secure zone to another in two diff cluster

Currently, distcp data from secure zone in one cluster to secure zone in another cluster is not pssible unless you copy encryption keys from source cluster to target cluster. If both clusters have same keys, specify the -skipcrccheck and -update flags to avoid verifying checksums.

AFAIK, HDFS client (DistCp) patch will be required using which HDFS Client will decrypt the Source-data in folder-1 using key-1, pass it over wire and encrypt Target-data in folder-1 using key-2, its a work in progress currently.

Re: distcp data from one secure zone to another in two diff cluster

New Contributor

@Pardeep

Can you help me why -skipcrccheck with -update is required when both the clusters have the same keys for secure zone.

is it becuase while distcping the data from once secure zone to another the files are decrypted on source first and encrypted on target again with different EDEKs than the EDEKs of source files?

Don't have an account?
Coming from Hortonworks? Activate your account here