Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to deal with large number of "light" mappers in YARN/MR2 since JVM reuse is disabled?

Highlighted

How to deal with large number of "light" mappers in YARN/MR2 since JVM reuse is disabled?

Contributor

We run Mapreduce jobs that have >100K mappers. Each mapper takes less than 10 sec to run. We could take advantage of JVM reuse in MR1 which hadoop can reuse the JVM for new mappers.

 

As everybodys knows that JVM reuse is disabled in YARN/MR2. So for each mapper, a new JVM/container will be launched, it will take extra few secondes to luanch a new container. You can imaging the performance for jobs that have more than 100K mappers with this overhead can be impacted badly.

 

 

We cannot use uber tasks since our mapper number is huge.

 

Does Cloudera has a solution for this? I think it's really a bad idea to retire jvm reuse in YARN. At least people can make it avialble and set the default to disable.

 

Thanks!

 

 

1 REPLY 1

Re: How to deal with large number of "light" mappers in YARN/MR2 since JVM reuse is disabl

New Contributor

Hi ,

Please do let me know if you find any work-around or alternate solution to solve this problem.  I am also looking for solution on this topic.

 

Thanks,

Narendra jonna