07-30-2016 09:10 AM
I'm using a compaction MR job that compacts the small files my MR jobs may create.
When i compact the files i got files with >100GB, DistCp such files with 1 mapper between 2 farms may take more than 5 hours, and i have alot of such compacted files.
That means that DistCp 1 day created files may takes days and i'm always with huge backlog between the active farm and backup DR farm.
Any suggested solution?
Thanks in advance.
08-08-2016 08:19 AM
Did you check the network bandwith beetwen your clusters ?
Did you check the "allocated" bandwith for DistCp ? I think the allocated bandwith is very low per default. Try increasing that parameter first.
But definetly, check that there is enough network bandwith.