Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar
Expert Contributor

I have tried copying data between non-kerberized cluster to kerberized cluster using Snapshot diff, but it was failing with following error in HDP 2.5.3

Currently distcp only supports hdfs:// RPC protocol for snapshot based diff copy. If you use webhdfs either in source or target you will encouter this error.

Command:

$hadoop distcp -diff s1 s2 -update webhdfs://nonsecure_cluster:50070/source  hdfs://secure_cluster:8020/target

Error:

    • java.lang.IllegalArgumentException: The FileSystems needs to be DistributedFileSystem for using snapshot-diff-based distcp
    • at org.apache.hadoop.tools.DistCpSync.preSyncCheck(DistCpSync.java:86)
    • at org.apache.hadoop.tools.DistCpSync.sync(DistCpSync.java:124)
    • at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:180)
    • at org.apache.hadoop.tools.DistCp.execute(DistCp.java:155)
    • at org.apache.hadoop.tools.DistCp.run(DistCp.java:128)
    • at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
    • at org.apache.hadoop.tools.DistCp.main(DistCp.java:462)
    3,382 Views
    Comments
    avatar
    Contributor

    @Rajesh

    Are you able to update with the target with snapshot difference?

    hadoop distcp -diff s1 s2 -update hdfs://secure_cluster:8020/source hdfs://secure_cluster:8020/target

    though it is syntactically correct, I don't see its working, I see the error as below.

    17/04/18 14:29:53 ERROR tools.DistCp: Invalid arguments: java.lang.IllegalArgumentException: Diff is valid only with update and delete options at org.apache.hadoop.tools.DistCpOptions.validate(DistCpOptions.java:568) at org.apache.hadoop.tools.DistCpOptions.setUseDiff(DistCpOptions.java:284) at org.apache.hadoop.tools.OptionsParser.parse(OptionsParser.java:223) at org.apache.hadoop.tools.DistCp.run(DistCp.java:115) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.tools.DistCp.main(DistCp.java:430)

    Invalid arguments: Diff is valid only with update and delete options usage: distcp OPTIONS [source_path...] <target_path>

    avatar
    Rising Star

    Hello @Rajesh,

    Following the lovely request of a customer here, there is a HWKs tasks to support it with HDP3.0.

    If there is some way to backport it, I will try to update this thread.

    With kind regards

    avatar
    Contributor

    Thanks for the update! I ran into this myself when designing a Python HDFS snapshot manager for two of our clusters.

    avatar
    New Contributor

    Does anyone found a solution about this issue ?

    Thanks