- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Incorrect value of number of mappers and reducers in Tez mode
- Labels:
-
Apache Hadoop
-
Apache Tez
Created on 09-17-2018 10:41 AM - edited 08-18-2019 12:37 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In the Tez mode of execution of Informatica Mappings on HDP 2.6 cluster, I observed that the property “mapreduce.job.maps” and “mapreduce.job.reduces” present in the Configuration of a job run on Hive in Tez mode fetches the wrong values as compared to the one in Mapreduce mode.
For a large set of data,
In MR mode the values are,
mapreduce.job.maps: 3
mapreduce.job.reduces: 0
While for Tez it is,
mapreduce.job.maps: 2
mapreduce.job.reduces: 6
But the DAG Graphical view shows that there are 3 mappers.
There is a discrepancy in the value of “mapreduce.job.reduces” property in the Tez UI as well.
We are unable to find an equivalent property in the Tez configurations that is correctly populated with the number of mappers and reducers.
Created 09-17-2018 10:48 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The properties 'mapreduce.job.*' are only applicable to MR jobs. In Tez, the number of mappers and controlled by below parameters:
- tez.grouping.max-size(default 1073741824 which is 1GB)
- tez.grouping.min-size(default 52428800 which is 50MB)
- tez.grouping.split-count(not set by default)
And, reducers are controlled in Hive with properties:
- hive.exec.reducers.bytes.per.reducer(default 256000000)
- hive.exec.reducers.max(default 1009)
- hive.tez.auto.reducer.parallelism(default false)
For more details, refer link.
