Member since
02-25-2018
1
Post
0
Kudos Received
0
Solutions
02-25-2018
11:34 AM
I was doing some deep dive with a simple use case to see how we can control the number of mappers launched in hive. This is what I did : Step1 : Found out how much is the block size of the system. hdfs getconf -confKey dfs.blocksize
Output : 134217728 => 128 Megabytes (MB) Step2 : Placed a file of 392781672 bytes (392 MB) in HDFS and created a table on top of it Step 3 : Ran a simple count (select count(1) from table) which triggered. Mappers : 3
Reducers : 1 which is as expected. Step 4 : Now changed the setting : set mapred.min.split.size = 392503151 set mapred.max.split.size = 392503000 Step 5 : Ran a select count(1) from table and it still triggers 3 mappers and 1 reducer. Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 1 Question : I would expect this to run only 1 mapper since now the file size and my min max splits size is the same , then why its not following this principle here.
... View more
Labels:
- Labels:
-
Apache Hadoop