Hi,
I am using distcp to copy data from hadoop hdfs to s3. below is the shorthand command of what i use
hadoop distcp -pu -update -delete hdfs_path s3a://bucket
recently got into an issue with the below case
i have a file in hdfs -> temp_file with data 1234567890 with size 27kb
for the first time when i use distcp. it pushes the file to s3 bucket without any issue.
second time i update the same file temp_file with different content abcdefghij but with same size 27kb
now when i run distcp. instead of checking the checksum of source and target distcp skips the file directly and doesnt copy the updated file from hdfs to s3
Am i missing any options in distcp command to make this scenario work?