Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Does mapper task depends on how many blocks I have in HDFS?

Does mapper task depends on how many blocks I have in HDFS?

New Contributor

on the multiple different blog posts I have found that one mapper task is created per block. However when I tried to to check that, I don't see it happening.

I have 100 files each of 385 Bytes only however occupies the whole block. How do I know that?

[root@sandbox-hdp ~]# hdfs fsck /apps/hive/warehouse/temp.db/emp_orc_small_files/000000_0_copy_11  -files -blocks
Connecting to namenode via http://sandbox-hdp.hortonworks.com:50070/fsck?ugi=root&files=1&blocks=1&path=%2Fapps%2Fhive%2Fwareho...
FSCK started by root (auth:SIMPLE) from /172.17.0.2 for path /apps/hive/warehouse/temp.db/emp_orc_small_files/000000_0_copy_11 at Thu Sep 13 02:58:47 UTC 2018
/apps/hive/warehouse/temp.db/emp_orc_small_files/000000_0_copy_11 385 bytes, 1 block(s):  OK
0. BP-32082187-172.17.0.2-1517480669419:blk_1073745283_4489 len=385 repl=1

However when I run the query to calculate max value of a particular column, I see only 1 mapper and 1 reducer task is being created?

Shouldn't there be 100 mapper tasks being created?