I am running a hive which moving data from one table to another table.
first table number of splitted files in hdfs --> 12 files.
second table number of splitted files in hdfs --> 17 files.
for second table each file have size of 870 mb
i have setted this property in the hive to hive import statement.
and when querying the second table it takes
and 211 reducers.
and occupied whole yarn resources.
I want to restrict the number of mappers and reducers for the hive query.
Please help me to solve it.
a. mapred.map.tasks - The default number of map tasks per job is 2. Ignored when mapred.job.tracker is "local". You can modify using set mapred.map.tasks = <value>
b. mapred.reduce.tasks - The default number of reduce tasks per job is 1. Typically set to 99% of the cluster's reduce capacity, so that if a node fails the reduces can still be executed in a single wave. Ignored when mapred.job.tracker is "local". you can modify using set mapred.reduce.tasks = <value>
Alternatively you could search around "yarn queue" and ressource allocation.
This will not "restrict" the number of mappers or reducers but this will control how many can run concurrently by giving access to only a subset of the available resources.