The following parameters control the number of mappers for splittable formats with Tez:
set tez.grouping.min-size=16777216; -- 16 MB min split
set tez.grouping.max-size=1073741824; -- 1 GB max split
Adjust the above values to best suit your data file size to avoid file split grouping leading to increased number of mappers.
If you still don't see number of mappers increased and hive.input.format is set to “org.apache.hadoop.hive.ql.io.CombineHiveInputFormat”, you may need to adjust below properties as well
set mapreduce.input.fileinputformat.split.maxsize=50000;
set mapreduce.input.fileinputformat.split.minsize=50000;
Please note that data locality w.r.t nodes also plays roles in determining, for more information please refer to the below references
References:
https://community.cloudera.com/t5/Support-Questions/How-are-number-of-mappers-determined-for-a-query...
https://cwiki.apache.org/confluence/display/TEZ/How+initial+task+parallelism+works
https://cloudera.ericlin.me/2015/05/how-to-control-the-number-of-mappers-required-for-a-hive-query/
http://cloudsqale.com/2018/10/22/tez-internals-1-number-of-map-tasks/
http://cloudsqale.com/2018/12/24/orc-files-split-computation-hive-on-tez/