DistCp compacted files takes a lot of time


I'm using a compaction MR job that compacts the small files my MR jobs may create.


When i compact the files i got files with >100GB, DistCp such files with 1 mapper between 2 farms may take more than 5 hours, and i have alot of such compacted files.

That means that DistCp 1 day created files may takes days and i'm always with huge backlog between the active farm and backup DR farm.


Any suggested solution?


Thanks in advance.



Re: DistCp compacted files takes a lot of time

Did you check the network bandwith beetwen your clusters ?

Did you check the "allocated" bandwith for DistCp ? I think the allocated bandwith is very low per default. Try increasing that parameter first.


But definetly, check that there is enough network bandwith.