Support Questions

hakoomnamattata · ‎02-25-2018

I was doing some deep dive with a simple use case to see how we can control the number of mappers launched in hive.

This is what I did :

Step1 : Found out how much is the block size of the system.

hdfs getconf -confKey dfs.blocksize Output : 134217728 => 128 Megabytes (MB)

Step2 : Placed a file of 392781672 bytes (392 MB) in HDFS and created a table on top of it

Step 3 : Ran a simple count (select count(1) from table) which triggered.

Mappers : 3 Reducers : 1

which is as expected.

Step 4 : Now changed the setting :

set mapred.min.split.size = 392503151

set mapred.max.split.size = 392503000

Step 5 : Ran a select count(1) from table and it still triggers 3 mappers and 1 reducer.

Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 1

Question : I would expect this to run only 1 mapper since now the file size and my min max splits size is the same , then why its not following this principle here.

kgautam · ‎02-25-2018

1. What is the file format

2. Kindly provide the job conf parmateres as per the code.

http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-core/2.7.0...

Need whats the value for them.

 long minSize = Math.max(getFormatMinSplitSize(), getMinSplitSize(job));

 long maxSize = getMaxSplitSize(job);

Cloudera Community

Support Questions

Why mapred.min.split.size doesnt change the number of mappers for my query