I have tried copying data between non-kerberized cluster to kerberized cluster using Snapshot diff, but it was failing with following error in HDP 2.5.3
Currently distcp only supports hdfs:// RPC protocol for snapshot based diff copy. If you use webhdfs either in source or target you will encouter this error.
$hadoop distcp -diff s1 s2 -update webhdfs://nonsecure_cluster:50070/source hdfs://secure_cluster:8020/target
Are you able to update with the target with snapshot difference?
hadoop distcp -diff s1 s2 -update hdfs://secure_cluster:8020/source hdfs://secure_cluster:8020/target
though it is syntactically correct, I don't see its working, I see the error as below.
17/04/18 14:29:53 ERROR tools.DistCp: Invalid arguments:
java.lang.IllegalArgumentException: Diff is valid only with update and delete options
Invalid arguments: Diff is valid only with update and delete options
usage: distcp OPTIONS [source_path...] <target_path>
Following the lovely request of a customer here, there is a HWKs tasks to support it with HDP3.0.
If there is some way to backport it, I will try to update this thread.
With kind regards
Thanks for the update! I ran into this myself when designing a Python HDFS snapshot manager for two of our clusters.
Does anyone found a solution about this issue ?