Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Number of mapper is not changing

avatar
Explorer

Our  user is running a job and which is a hive query and  number of  mapper is always  6 and not changing even the data size change.  It is  a  insert query.   How do  I change number of mappers ?  Which parameter determine number of mappers?

1 REPLY 1

avatar
Contributor

The following parameters control the number of mappers for splittable formats with Tez:

 

set tez.grouping.min-size=16777216; -- 16 MB min split
set tez.grouping.max-size=1073741824; -- 1 GB max split

 

Adjust the above values to best suit your data file size to avoid file split grouping leading to increased number of mappers.

If you still don't see number of mappers increased and hive.input.format is set to “org.apache.hadoop.hive.ql.io.CombineHiveInputFormat”, you may need to adjust below properties as well

 

set mapreduce.input.fileinputformat.split.maxsize=50000;
set mapreduce.input.fileinputformat.split.minsize=50000;

 

Please note that data locality w.r.t nodes also plays roles in determining, for more information please refer to the below references 

References:
https://community.cloudera.com/t5/Support-Questions/How-are-number-of-mappers-determined-for-a-query...
https://cwiki.apache.org/confluence/display/TEZ/How+initial+task+parallelism+works
https://cloudera.ericlin.me/2015/05/how-to-control-the-number-of-mappers-required-for-a-hive-query/
http://cloudsqale.com/2018/10/22/tez-internals-1-number-of-map-tasks/
http://cloudsqale.com/2018/12/24/orc-files-split-computation-hive-on-tez/