This article concludes the series (see part 1, part 2 and part 3). Here we look at how to avoid some common NameNode performance issues.
This article is for Apache Hadoop administrators who are familiar with HDFS and its components. If you are using Ambari you should know how to manage services and configurations with Ambari.
Avoid Disk Throughput Issues
The performance of the HDFS NameNode depends on being able to flush its edit logs to disk relatively quickly. Any delay in writing edit logs will hold up valuable RPC handler threads and can have cascading effects on the performance of your entire Hadoop cluster.
You should use a dedicated hard disk i.e. dedicated spindle for the edit logs to keep edits flowing to disk smoothly. The location of the edit log is configured via the dfs.namenode.edits.dir setting in hdfs-site.xml. If dfs.namenode.edits.dir is not configured, it defaults to the same value as dfs.namenode.name.dir. e.g.
If there are multiple comma separated directories for dfs.namenode.name.dir, the NameNode will replicate its metadata to each of these directories. In the example above it is recommended that /mnt/disk1 and /mnt/disk2 refer to dedicated spindles.
As a corollary, if you are using NameNode HA with QJM then the Journal Nodes should also use dedicated spindles for writing edit log transactions. The JournalNode edits directory can be configured via dfs.journalnode.edits.dir in hdfs-site.xml.
The setting dfs.namenode.accesstime.precision controls how often the NameNode will update the last accessed time for each file. It is specified in milliseconds. If this value is too low, then the NameNode is forced to write an edit log transaction to update the file's last access time for each read request and its performance will suffer.
The default value of this setting is a reasonable 3600000 milliseconds (1 hour). We recommend going one step further and setting it to zero so last access time updates are turned off. Add the following to your hdfs-site.xml.
The NameNode frequently performs group lookups for users submitting requests. This is part of the normal access control mechanism. Group lookups may involve contacting an LDAP server. If the LDAP server is slow to process requests, it can hold up NameNode RPC handler threads and affect cluster performance. When the NameNode suffers from this problem it logs a warning that looks like this:
Potential performance problem: getGroups(user=bob) took 23861 milliseconds.
The default warning threshold is 5 seconds. If you see this log message you should investigate the performance of your LDAP server right away. There are a few workarounds if it is not possible to fix the LDAP server performance immediately.
Increase the group cache timeout: The NameNode caches the results of group lookups to avoid making frequent lookups. The cache duration is set via hadoop.security.groups.cache.secs in your core-site.xml file. The default value of 300 (5 minutes) is too low. We have seen this value set as high as 7200 (2 hours) in busy clusters to avoid frequent group lookups.
Increase the negative cache timeout: For users with no group mapping, the NameNode maintains a short negative cache. The default negative cache duration is 30 seconds and may be too low. You can increase this with the hadoop.security.groups.negative-cache.secs
setting in core-site.xml. This setting is useful if you see frequent messages like No groups available for user alice in the NameNode logs.
Add a static mapping override for specific users: A static mapping is an in-memory mapping added via configuration that eliminates the need to lookup groups via any other mechanism. It can be a useful last resort to work around LDAP server performance issues that cannot be easily addressed by the cluster administrator. An example static mapping for two users alice and bob looks like this:
Sometimes the NameNode can generate excessive logging spew which severely impacts its performance. Rendering logging strings is an expensive operation that consumes CPU cycles. It also generates a lot of small short-lived objects (garbage) causing excessive Garbage Collection.
The Apache Hadoop community has made a number of fixes to reduce the verbosity of NameNode logs and these fixes are available in recent HDP maintenance releases. e.g. HDFS-9434, HADOOP-12903, HDFS-9941, HDFS-9906 and many more.
Depending on your specific version of Apache Hadoop or HDP you may run into one or more of these problems. Also new problems can be exposed by changing workloads. It is a good idea to monitor your NameNode logs periodically for excessive log spew and report it either by contacting the Hortonworks support team or filing a bug in the Apache Hadoop Jira.
This wraps up the series on getting the best performance possible out of your NameNode. We hope these tips will keep your cluster running at its best and your users happy.