Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Why mapred.min.split.size doesnt change the number of mappers for my query

avatar
New Contributor

I was doing some deep dive with a simple use case to see how we can control the number of mappers launched in hive.

This is what I did :

Step1 : Found out how much is the block size of the system.

hdfs getconf -confKey dfs.blocksize Output : 134217728 => 128 Megabytes (MB)

Step2 : Placed a file of 392781672 bytes (392 MB) in HDFS and created a table on top of it

Step 3 : Ran a simple count (select count(1) from table) which triggered.

Mappers : 3 Reducers : 1

which is as expected.

Step 4 : Now changed the setting :

set mapred.min.split.size = 392503151

set mapred.max.split.size = 392503000

Step 5 : Ran a select count(1) from table and it still triggers 3 mappers and 1 reducer.

Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 1

Question : I would expect this to run only 1 mapper since now the file size and my min max splits size is the same , then why its not following this principle here.

1 REPLY 1

avatar

1. What is the file format

2. Kindly provide the job conf parmateres as per the code.

http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-core/2.7.0...

Need whats the value for them.

 long minSize = Math.max(getFormatMinSplitSize(), getMinSplitSize(job));
 long maxSize = getMaxSplitSize(job);