Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Difference between input split and block in HadoopMapReduce ?InputSplitvs Block Size in Hadoop

Highlighted

Difference between input split and block in HadoopMapReduce ?InputSplitvs Block Size in Hadoop

New Contributor

Difference between input split and block in HadoopMapReduce ?InputSplitvs Block Size in Hadoop

1 REPLY 1

Re: Difference between input split and block in HadoopMapReduce ?InputSplitvs Block Size in Hadoop

New Contributor

Block
It is the physical representation of data. It contains a minimum amount of data that can be read or write.
The default size of the HDFS block is 128 MB which we can configure as per our requirement. All blocks of the file are of the same size except the last block, which can be of same size or smaller. The files are split into 128 MB blocks and then stored into Hadoop filesystem.
InputSplit
It is the logical representation of data present in the block. It is used during data processing in the MapReduce program or other processing techniques.
InputSplit doesn’t contain actual data, but a reference to the data. By default, split size is approximately equal to block size. InputSplit is user-defined and the user can control split size based on the size of data in the MapReduce program.