I'm using a compaction MR job that compacts the small files my MR jobs may create.
When i compact the files i got files with >100GB, DistCp such files with 1 mapper between 2 farms may take more than 5 hours, and i have alot of such compacted files.
That means that DistCp 1 day created files may takes days and i'm always with huge backlog between the active farm and backup DR farm.
Any suggested solution?
Thanks in advance.
Did you check the network bandwith beetwen your clusters ?
Did you check the "allocated" bandwith for DistCp ? I think the allocated bandwith is very low per default. Try increasing that parameter first.
But definetly, check that there is enough network bandwith.