Support Questions

Find answers, ask questions, and share your expertise

Hadoop Distcp -update skips file

New Contributor


I am using distcp to copy data from hadoop hdfs to s3. below is the shorthand command of what i use


hadoop distcp -pu -update -delete hdfs_path s3a://bucket


recently got into an issue with the below case


i have a file in hdfs -> temp_file with data 1234567890 with size 27kb

for the first time when i use distcp. it pushes the file to s3 bucket without any issue.


second time i update the same file temp_file with different content abcdefghij but with same size 27kb

now when i run distcp. instead of checking the checksum of source and target distcp skips the file directly and doesnt copy the updated file from hdfs to s3


Am i missing any options in distcp command to make this scenario work?



Rising Star

HI @rajilion , Thanks for reaching out to Cloudera community. Can you please test the Update and overwrite mentioned in the below article and let us know how it goes -