I was doing some deep dive with a simple use case to see how we can control the number of mappers launched in hive.
This is what I did :
Step1 : Found out how much is the block size of the system.
hdfs getconf -confKey dfs.blocksize
Output : 134217728 => 128 Megabytes (MB)
Step2 : Placed a file of 392781672 bytes (392 MB) in HDFS and created a table on top of it
Step 3 : Ran a simple count (select count(1) from table) which triggered.
Mappers : 3
Reducers : 1
which is as expected.
Step 4 : Now changed the setting :
set mapred.min.split.size = 392503151
set mapred.max.split.size = 392503000
Step 5 : Ran a select count(1) from table and it still triggers 3 mappers and 1 reducer.
Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 1
Question : I would expect this to run only 1 mapper since now the file size and my min max splits size is the same , then why its not following this principle here.