- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Number of mapper is not changing
- Labels:
-
Apache Hive
Created 12-01-2021 06:07 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Our user is running a job and which is a hive query and number of mapper is always 6 and not changing even the data size change. It is a insert query. How do I change number of mappers ? Which parameter determine number of mappers?
Created on 01-30-2022 10:34 PM - edited 01-30-2022 10:37 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The following parameters control the number of mappers for splittable formats with Tez:
set tez.grouping.min-size=16777216; -- 16 MB min split
set tez.grouping.max-size=1073741824; -- 1 GB max split
Adjust the above values to best suit your data file size to avoid file split grouping leading to increased number of mappers.
If you still don't see number of mappers increased and hive.input.format is set to “org.apache.hadoop.hive.ql.io.CombineHiveInputFormat”, you may need to adjust below properties as well
set mapreduce.input.fileinputformat.split.maxsize=50000;
set mapreduce.input.fileinputformat.split.minsize=50000;
Please note that data locality w.r.t nodes also plays roles in determining, for more information please refer to the below references
References:
https://community.cloudera.com/t5/Support-Questions/How-are-number-of-mappers-determined-for-a-query...
https://cwiki.apache.org/confluence/display/TEZ/How+initial+task+parallelism+works
https://cloudera.ericlin.me/2015/05/how-to-control-the-number-of-mappers-required-for-a-hive-query/
http://cloudsqale.com/2018/10/22/tez-internals-1-number-of-map-tasks/
http://cloudsqale.com/2018/12/24/orc-files-split-computation-hive-on-tez/
