About hakoomnamattata

hakoomnamattata · ‎02-25-2018

I was doing some deep dive with a simple use case to see how we can control the number of mappers launched in hive. This is what I did : Step1 : Found out how much is the block size of the system. hdfs getconf -confKey dfs.blocksize Output : 134217728 => 128 Megabytes (MB) Step2 : Placed a file of 392781672 bytes (392 MB) in HDFS and created a table on top of it Step 3 : Ran a simple count (select count(1) from table) which triggered. Mappers : 3 Reducers : 1 which is as expected. Step 4 : Now changed the setting : set mapred.min.split.size = 392503151 set mapred.max.split.size = 392503000 Step 5 : Ran a select count(1) from table and it still triggers 3 mappers and 1 reducer. Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 1 Question : I would expect this to run only 1 mapper since now the file size and my min max splits size is the same , then why its not following this principle here.

Online	Offline
Last Visited	‎02-25-2018 01:01 PM

Member Since	‎02-25-2018 11:30 AM
Last Visited	‎02-25-2018 01:01 PM
Posts	1

Cloudera Community

Why mapred.min.split.size doesnt change the number...