Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Discp

avatar
Explorer

If you are using distcp command for transferring data from one cluster to another cluster on regular basis in this scenario only new data will be copied on daily basis so how distcp keep tracks on it?

1 ACCEPTED SOLUTION

avatar
Master Mentor

@kiranpune 

DistCp (distributed copy) is a tool used for large inter/intra-cluster copying. It uses MapReduce to effect its distribution, error handling and recovery and reporting. It expands a list of files and directories into the input to map tasks, each of which will copy a partition of the files specified in the source list that basic description. 

 

But one can use different command-line options when running DISTCP  see the official dictcp documentation below are a few options for your different use cases.  

OPTIONS

-append: Incremental copy of the file with the same name but different length
-update: Overwrite if source and destination differ in size, block size, or checksum
-overwrite: Overwrite destination

-delete: Delete the files existing in the destination but not in the source

 

I think you can schedule or script a daily copy 

 

View solution in original post

1 REPLY 1

avatar
Master Mentor

@kiranpune 

DistCp (distributed copy) is a tool used for large inter/intra-cluster copying. It uses MapReduce to effect its distribution, error handling and recovery and reporting. It expands a list of files and directories into the input to map tasks, each of which will copy a partition of the files specified in the source list that basic description. 

 

But one can use different command-line options when running DISTCP  see the official dictcp documentation below are a few options for your different use cases.  

OPTIONS

-append: Incremental copy of the file with the same name but different length
-update: Overwrite if source and destination differ in size, block size, or checksum
-overwrite: Overwrite destination

-delete: Delete the files existing in the destination but not in the source

 

I think you can schedule or script a daily copy