Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Discp

Solved Go to solution
Highlighted

Discp

Explorer

If you are using distcp command for transferring data from one cluster to another cluster on regular basis in this scenario only new data will be copied on daily basis so how distcp keep tracks on it?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Discp

Mentor

@kiranpune 

DistCp (distributed copy) is a tool used for large inter/intra-cluster copying. It uses MapReduce to effect its distribution, error handling and recovery and reporting. It expands a list of files and directories into the input to map tasks, each of which will copy a partition of the files specified in the source list that basic description. 

 

But one can use different command-line options when running DISTCP  see the official dictcp documentation below are a few options for your different use cases.  

OPTIONS

-append: Incremental copy of the file with the same name but different length
-update: Overwrite if source and destination differ in size, block size, or checksum
-overwrite: Overwrite destination

-delete: Delete the files existing in the destination but not in the source

 

I think you can schedule or script a daily copy 

 

View solution in original post

1 REPLY 1
Highlighted

Re: Discp

Mentor

@kiranpune 

DistCp (distributed copy) is a tool used for large inter/intra-cluster copying. It uses MapReduce to effect its distribution, error handling and recovery and reporting. It expands a list of files and directories into the input to map tasks, each of which will copy a partition of the files specified in the source list that basic description. 

 

But one can use different command-line options when running DISTCP  see the official dictcp documentation below are a few options for your different use cases.  

OPTIONS

-append: Incremental copy of the file with the same name but different length
-update: Overwrite if source and destination differ in size, block size, or checksum
-overwrite: Overwrite destination

-delete: Delete the files existing in the destination but not in the source

 

I think you can schedule or script a daily copy 

 

View solution in original post

Don't have an account?
Coming from Hortonworks? Activate your account here