Created on 07-07-2016 04:13 AM
This article is for Apache Hadoop administrators who are familiar with HDFS and its components. If you are using Ambari you should know how to manage services and configurations with Ambari.
The performance of the HDFS NameNode depends on being able to flush its edit logs to disk relatively quickly. Any delay in writing edit logs will hold up valuable RPC handler threads and can have cascading effects on the performance of your entire Hadoop cluster.
You should use a dedicated hard disk i.e. dedicated spindle for the edit logs to keep edits flowing to disk smoothly. The location of the edit log is configured via the dfs.namenode.edits.dir setting in hdfs-site.xml. If dfs.namenode.edits.dir is not configured, it defaults to the same value as dfs.namenode.name.dir. e.g.
<property> <name>dfs.namenode.name.dir</name> <value>/mnt/disk1,/mnt/disk2</value> </property>
If there are multiple comma separated directories for dfs.namenode.name.dir, the NameNode will replicate its metadata to each of these directories. In the example above it is recommended that /mnt/disk1 and /mnt/disk2 refer to dedicated spindles.
As a corollary, if you are using NameNode HA with QJM then the Journal Nodes should also use dedicated spindles for writing edit log transactions. The JournalNode edits directory can be configured via dfs.journalnode.edits.dir in hdfs-site.xml.
<property> <name>dfs.journalnode.edits.dir</name> <value>/mnt/disk3</value> </property>
Which leads us to the next pitfall.
The setting dfs.namenode.accesstime.precision controls how often the NameNode will update the last accessed time for each file. It is specified in milliseconds. If this value is too low, then the NameNode is forced to write an edit log transaction to update the file's last access time for each read request and its performance will suffer.
The default value of this setting is a reasonable 3600000 milliseconds (1 hour). We recommend going one step further and setting it to zero so last access time updates are turned off. Add the following to your hdfs-site.xml.
<property> <name>dfs.namenode.accesstime.precision</name> <value>0</value> </property>
The NameNode frequently performs group lookups for users submitting requests. This is part of the normal access control mechanism. Group lookups may involve contacting an LDAP server. If the LDAP server is slow to process requests, it can hold up NameNode RPC handler threads and affect cluster performance. When the NameNode suffers from this problem it logs a warning that looks like this:
Potential performance problem: getGroups(user=bob) took 23861 milliseconds.
The default warning threshold is 5 seconds. If you see this log message you should investigate the performance of your LDAP server right away. There are a few workarounds if it is not possible to fix the LDAP server performance immediately.
<property> <name>hadoop.user.group.static.mapping.overrides</name> <value>alice=staff,hadoop;bob=students,hadoop;</value> </property>
The HCC article Hadoop and LDAP: Usage, Load Patterns and Tuning has a lot more detail on debugging and optimizing Hadoop LDAP performance.
Sometimes the NameNode can generate excessive logging spew which severely impacts its performance. Rendering logging strings is an expensive operation that consumes CPU cycles. It also generates a lot of small short-lived objects (garbage) causing excessive Garbage Collection.
The Apache Hadoop community has made a number of fixes to reduce the verbosity of NameNode logs and these fixes are available in recent HDP maintenance releases. e.g. HDFS-9434, HADOOP-12903, HDFS-9941, HDFS-9906 and many more.
Depending on your specific version of Apache Hadoop or HDP you may run into one or more of these problems. Also new problems can be exposed by changing workloads. It is a good idea to monitor your NameNode logs periodically for excessive log spew and report it either by contacting the Hortonworks support team or filing a bug in the Apache Hadoop Jira.
This wraps up the series on getting the best performance possible out of your NameNode. We hope these tips will keep your cluster running at its best and your users happy.