Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

DistCp compacted files takes a lot of time

DistCp compacted files takes a lot of time



I'm using a compaction MR job that compacts the small files my MR jobs may create.


When i compact the files i got files with >100GB, DistCp such files with 1 mapper between 2 farms may take more than 5 hours, and i have alot of such compacted files.

That means that DistCp 1 day created files may takes days and i'm always with huge backlog between the active farm and backup DR farm.


Any suggested solution?


Thanks in advance.




Re: DistCp compacted files takes a lot of time

Super Collaborator

Did you check the network bandwith beetwen your clusters ?

Did you check the "allocated" bandwith for DistCp ? I think the allocated bandwith is very low per default. Try increasing that parameter first.


But definetly, check that there is enough network bandwith.