Support Questions
Find answers, ask questions, and share your expertise

Ideally what should be the block size to get maximum performance from cluster?

New Contributor

ideally to get maximum performance from cluster should be what block size ?

1 ACCEPTED SOLUTION

Rising Star

@Malay Sharma

Really depends on your workload. The best case is if you can get good block sizes, that if you have really large files, you should set your block size to 256MB( this reduces the number of blocks in the file system, since you are storing more data per block), or use the default 128 MB. If you have a running cluster, using smart sense, you can get a view of the file size distribution of your current cluster. That will give you an idea if you need to tune the block size for performance.

From experience, I can tell you that the performance of your cluster is not going to be dependent on block size unless you have a large number of files in the system.

Also, unlike a physical file system, setting the block size to 128MB does not mean that each block write will use up 128MB. HDFS will only use the number of bytes actually written, so there is no waste of space because of the block size.

View solution in original post

1 REPLY 1

Rising Star

@Malay Sharma

Really depends on your workload. The best case is if you can get good block sizes, that if you have really large files, you should set your block size to 256MB( this reduces the number of blocks in the file system, since you are storing more data per block), or use the default 128 MB. If you have a running cluster, using smart sense, you can get a view of the file size distribution of your current cluster. That will give you an idea if you need to tune the block size for performance.

From experience, I can tell you that the performance of your cluster is not going to be dependent on block size unless you have a large number of files in the system.

Also, unlike a physical file system, setting the block size to 128MB does not mean that each block write will use up 128MB. HDFS will only use the number of bytes actually written, so there is no waste of space because of the block size.

; ;