Created 01-15-2018 03:30 AM
I am trying to understand distcp delete option basically each time I do distcp i would like to overwrite destination directories.
The overwrite option only does it with files so if there is a file in destination with same content of a file in source but a different name would not be overriden but I would like to do it at the directory level as well.
From the hadoop docs
Delete the files existing in the dst but not in src
https://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html
Thank you
Created 01-16-2018 08:44 PM
I think thats expected behaaviour. For your scenario I would better suggest to go for DistCp between Snapshot Difference.
distcp -update -diff -delete /source /destination
How to Use This Feature
To use this feature, you should first make sure all assumptions are met. Typical steps are described as follows:
distcp
command that copies everything from s0 to the target directory (command line is like distcp -update <sourceDir>/.snapshot/s0 <targetDir>
).distcp
command like distcp -update -diff s0 s1 <sourceDir> <targetDir>
to copy all changes between s0 and s1 to the target directory.