distcp data from one secure zone to another in two diff cluster is failing with following error.
Caused by: java.io.IOException: Couldn't run retriable-command: Copying hdfs://nn1//data/raw//cust_sec_vm/000017_0 to hdfs://nn2:8020/user/000017_0
... 10 more
Caused by: java.io.IOException: Check-sum mismatch between hdfs://nn1//data/raw//cust_sec_vm/000017_0 to hdfs://nn2:8020/user/000017_0
You can enable preserve block and check-sum in the distcp copying using -pbc.
hadoop distcp -pbc <SRC> <DEST>
Is there a version mismatch between the SRC and DEST?
Currently, distcp data from secure zone in one cluster to secure zone in another cluster is not pssible unless you copy encryption keys from source cluster to target cluster. If both clusters have same keys, specify the
-update flags to avoid verifying checksums.
AFAIK, HDFS client (DistCp) patch will be required using which HDFS Client will decrypt the Source-data in folder-1 using key-1, pass it over wire and encrypt Target-data in folder-1 using key-2, its a work in progress currently.
Can you help me why -skipcrccheck with -update is required when both the clusters have the same keys for secure zone.
is it becuase while distcping the data from once secure zone to another the files are decrypted on source first and encrypted on target again with different EDEKs than the EDEKs of source files?