on the multiple different blog posts I have found that one mapper task is created per block. However when I tried to to check that, I don't see it happening.
I have 100 files each of 385 Bytes only however occupies the whole block. How do I know that?
[root@sandbox-hdp ~]# hdfs fsck /apps/hive/warehouse/temp.db/emp_orc_small_files/000000_0_copy_11 -files -blocks
Connecting to namenode via http://sandbox-hdp.hortonworks.com:50070/fsck?ugi=root&files=1&blocks=1&path=%2Fapps%2Fhive%2Fwareho...
FSCK started by root (auth:SIMPLE) from /172.17.0.2 for path /apps/hive/warehouse/temp.db/emp_orc_small_files/000000_0_copy_11 at Thu Sep 13 02:58:47 UTC 2018
/apps/hive/warehouse/temp.db/emp_orc_small_files/000000_0_copy_11 385 bytes, 1 block(s): OK
0. BP-32082187-172.17.0.2-1517480669419:blk_1073745283_4489 len=385 repl=1
However when I run the query to calculate max value of a particular column, I see only 1 mapper and 1 reducer task is being created?
Shouldn't there be 100 mapper tasks being created?