Support Questions

Find answers, ask questions, and share your expertise

Why HDFS Blocks are Large in Size?

Why HDFS Blocks are Large in Size?

2 REPLIES 2

The main reason for having the HDFS blocks in large size is to reduce the cost of disk seek time. Disk seeks are generally expensive operations. Since Hadoop is designed to run over your entire dataset, it is best to minimize seeks by using large files. In general, the seek time is 10ms and disk transfer rate is 100MB/s. To make the seek time 1% of the disk transfer rate, the block size should be 100MB. Hence to reduce the cost of disk seek time HDFS block default size is 64MB/128MB.

New Contributor

The ideas for the large size of blocks are:

  • To reduce the expense of seek: Because of the large size blocks, the time consumed to shift the data from the disk can be longer than the usual time taken to commence the block. As a result, the multiple blocks are transferred at the disk transfer rate.
  • If there are small blocks, the number of blocks will be too many in Hadoop HDFS and too much metadata to store. Managing such a vast number of blocks and metadata will create overhead and head to traffic in a network. Source: Link.