Community Articles
Find and share helpful community-sourced technical articles.
Labels (1)

Use recommended mount options for all HDFS data disks

There are specific filesystem mount options that have proven to be more efficient for Hadoop clusters. Using these mount options will provide performance benefits. Since mount options are applied when mounting the filesystem, on system boot or remounting for example, changes to /etc/fstab alone are not enough for these settings to take effect. The recommended approach is to make the mount option changes and either manually remount the individual file systems or reboot the host for the settings to take effect.

Use the following mount options for the respective file systems:

ext4 —> "inode_readahead_blks=128","data=writeback","noatime","nodev"
xfs —> "noatime"

Configure HDFS block size for optimal performance

Having optimal HDFS block size boosts NameNode performance as well as job execution performance.

Make sure that the blocksize ('dfs.blocksize'  in 'hdfs-site.xml') is within the recommended range of 134217728 to 1073741824 (exclusive).

Enable HDFS short circuit reads

In HDFS, reads normally go through the DataNode. Thus, when the client asks the DataNode to read a file, the DataNode reads that file off of the disk and sends the data to the client over a TCP socket. So-called short-circuit reads bypass the DataNode, allowing the client to read the file directly. Obviously, this is only possible in cases where the client is co-located with the data. Short-circuit reads provide a substantial performance boost to many applications.

Enable short circuit read for better performance. To configure short-circuit local reads, you will also need to enable (dfs.domain.socket.path).

In hdfs-site.xml set the below:

Avoid file sizes that are smaller than a block size

Average block size should be greater than the recommended block size of 67108864 MB. An average size below the recommended size adds more burden to the NameNode, cause heap/GC issues in addition to cause storage and processing to be inefficient.

Set the block size greater than 67108864 MB

Also, use one or more of the following techniques to consolidate smaller files :
- run Hive/HBase compactions
- merge small files
- use HAR to compact small files

Tune DataNode JVM for optimal performance

DataNode is sensitive to the JVM performance and behavior. Make sure that the DataNode JVM is configured for optimal performance

Sample JVM Configs:, 


-Xms should be same as -Xmx
New generation size should be ⅛ of the total JVM size.

Avoid reads or write from stale DataNodes

DataNodes that have not sent a heartbeat to NameNode for a defined interval, may be under load or may have died. Avoid sending any read/write requests to such 'stale' DataNodes.

In hdfs-site.xml set the below:

Use JNI-based group lookup over other implementations

Hadoop uses a pluggable interface with multiple possible implementations for looking up the group memberships of a user. The JNI-based implementation has better performance characteristics than other implementations.

In core-site.xml set:

*** This article focuses on settings that would improve HDFS performance. However, may impact other areas such as stability and uptime. Please understand the settings before applying them ***

*** You might also be interested in the following articles: ***

OS Configurations for Better Hadoop Performance

New Member

Could you please clarify what you mean by (exclusive) in this paragraph "Make sure that the blocksize ('dfs.blocksize'in'hdfs-site.xml')is within the recommended range of 134217728 to 1073741824(exclusive)" ?

You mean the maximum value used should be 1073741823 and we shouldn't use 1073741824 with is exactly 1GiB ?

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.
Version history
Last update:
‎05-16-2017 03:25 PM
Updated by:
Top Kudoed Authors