Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

DistCp compacted files takes a lot of time

Highlighted

DistCp compacted files takes a lot of time

Explorer

Hi,

I'm using a compaction MR job that compacts the small files my MR jobs may create.

 

When i compact the files i got files with >100GB, DistCp such files with 1 mapper between 2 farms may take more than 5 hours, and i have alot of such compacted files.

That means that DistCp 1 day created files may takes days and i'm always with huge backlog between the active farm and backup DR farm.

 

Any suggested solution?

 

Thanks in advance.

 

 

1 REPLY 1

Re: DistCp compacted files takes a lot of time

Super Collaborator

Did you check the network bandwith beetwen your clusters ?

Did you check the "allocated" bandwith for DistCp ? I think the allocated bandwith is very low per default. Try increasing that parameter first.

 

But definetly, check that there is enough network bandwith.