Created on 11-02-2017 04:19 AM - edited 09-16-2022 05:28 AM
I am running a hive which moving data from one table to another table.
first table number of splitted files in hdfs --> 12 files.
second table number of splitted files in hdfs --> 17 files.
for second table each file have size of 870 mb
i have setted this property in the hive to hive import statement.
set mapreduce.input.fileinputformat.split.maxsize=858993459;
set mapreduce.input.fileinputformat.split.minsize=858993459;
and when querying the second table it takes
51 mappers
and 211 reducers.
and occupied whole yarn resources.
I want to restrict the number of mappers and reducers for the hive query.
Please help me to solve it.
Created 11-02-2017 08:52 AM
a. mapred.map.tasks - The default number of map tasks per job is 2. Ignored when mapred.job.tracker is "local". You can modify using set mapred.map.tasks = <value>
b. mapred.reduce.tasks - The default number of reduce tasks per job is 1. Typically set to 99% of the cluster's reduce capacity, so that if a node fails the reduces can still be executed in a single wave. Ignored when mapred.job.tracker is "local". you can modify using set mapred.reduce.tasks = <value>
Created on 11-03-2017 06:38 AM - edited 11-03-2017 06:38 AM
Alternatively you could search around "yarn queue" and ressource allocation.
This will not "restrict" the number of mappers or reducers but this will control how many can run concurrently by giving access to only a subset of the available resources.