Support Questions
Find answers, ask questions, and share your expertise

What is the fundamental difference between a MapReduce InputSplit and HDFS block?

Difference between a MapReduce InputSplit and HDFS block?


Input Split: It’s the logical division of records which means to say it doesn’t contain any data inside but a logical reference to data. It’s only used during data processing by MapReduce . User can control the size of the InputSplit and each InputSplit is assigned to individual mappers for processing. It’s defined by the InputFormat class.

HDFS Block: It’s the physical representation of data. It contains a minimum amount of data that can be read or write. The default size of HDFS block is 128 MB which we can configure according to our requirements. All the blocks of the file are of the same size except the last block which might be of the same size or smaller. The files are divided into 128MB blocks and then stored in the file system.