Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

What is the fundamental difference between a MapReduce InputSplit and HDFS block?

Highlighted

What is the fundamental difference between a MapReduce InputSplit and HDFS block?

New Contributor

Difference between a MapReduce InputSplit and HDFS block?

1 REPLY 1

Re: What is the fundamental difference between a MapReduce InputSplit and HDFS block?

New Contributor

Input Split: It’s the logical division of records which means to say it doesn’t contain any data inside but a logical reference to data. It’s only used during data processing by MapReduce . User can control the size of the InputSplit and each InputSplit is assigned to individual mappers for processing. It’s defined by the InputFormat class.


HDFS Block: It’s the physical representation of data. It contains a minimum amount of data that can be read or write. The default size of HDFS block is 128 MB which we can configure according to our requirements. All the blocks of the file are of the same size except the last block which might be of the same size or smaller. The files are divided into 128MB blocks and then stored in the file system.

Don't have an account?
Coming from Hortonworks? Activate your account here