Created on 07-07-2016 04:13 AM - edited on 04-29-2024 04:41 AM by VidyaSargur
This article is part of a series (see part 1, part 2, part 3 and part 5). Here we look at how to avoid some common NameNode performance issues.
This article is for Apache Hadoop administrators who are familiar with HDFS and its components. If you are using Ambari or Cloudera Manager, you should know how to manage services and configurations.
The performance of the HDFS NameNode depends on being able to flush its edit logs to disk relatively quickly. Any delay in writing edit logs will hold up valuable RPC handler threads and can have cascading effects on the performance of your entire Hadoop cluster.
You should use a dedicated hard disk i.e. dedicated spindle for the edit logs to keep edits flowing to the disk smoothly. The location of the edit log is configured using the dfs.namenode.edits.dir setting in hdfs-site.xml. If dfs.namenode.edits.dir is not configured, it defaults to the same value as dfs.namenode.name.dir.
Example:
<property> <name>dfs.namenode.name.dir</name> <value>/mnt/disk1,/mnt/disk2</value> </property>
If there are multiple comma-separated directories for dfs.namenode.name.dir, the NameNode will replicate its metadata to each of these directories. In the example above, it is recommended that /mnt/disk1 and /mnt/disk2 refer to dedicated spindles.
As a corollary, if you are using NameNode HA with QJM then the Journal Nodes should also use dedicated spindles for writing edit log transactions. The JournalNode edits directory can be configured via dfs.journalnode.edits.dir in hdfs-site.xml.
<property> <name>dfs.journalnode.edits.dir</name> <value>/mnt/disk3</value> </property>
This leads us to the next pitfall.
The setting dfs.namenode.accesstime.precision controls how often the NameNode will update the last accessed time for each file. It is specified in milliseconds. If this value is too low, then the NameNode is forced to write an edit log transaction to update the file's last access time for each read request and its performance will suffer.
The default value of this setting is a reasonable 3600000 milliseconds (1 hour). We recommend going one step further and setting it to zero so that last access time updates are turned off. Add the following to your hdfs-site.xml.
<property> <name>dfs.namenode.accesstime.precision</name> <value>0</value> </property>
The NameNode frequently performs group lookups for users submitting requests. This is part of the normal access control mechanism. Group lookups may involve contacting an LDAP server. If the LDAP server is slow to process requests, it can hold up NameNode RPC handler threads and affect cluster performance. When the NameNode suffers from this problem it logs a warning that looks like this:
Potential performance problem: getGroups(user=bob) took 23861 milliseconds.
The default warning threshold is 5 seconds. If you see this log message you should investigate the performance of your LDAP server right away. There are a few workarounds if it is not possible to fix the LDAP server performance immediately.
<property> <name>hadoop.user.group.static.mapping.overrides</name> <value>alice=staff,hadoop;bob=students,hadoop;</value> </property>
The HCC article Hadoop and LDAP: Usage, Load Patterns and Tuning has a lot more detail on debugging and optimizing Hadoop LDAP performance.
Sometimes the NameNode can generate excessive logging spew which severely impacts its performance. Rendering logging strings is an expensive operation that consumes CPU cycles. It also generates a lot of small short-lived objects (garbage) causing excessive Garbage Collection.
The Apache Hadoop community has made a number of fixes to reduce the verbosity of NameNode logs and these fixes are available in recent HDP maintenance releases. Example: HDFS-9434, HADOOP-12903, HDFS-9941, HDFS-9906 and many more.
Depending on your specific version of Apache Hadoop or HDP you may run into one or more of these problems. Also, new problems can be exposed by changing workloads. It is a good idea to monitor your NameNode logs periodically for excessive log spew and report it either by contacting the Hortonworks support team or filing a bug in the Apache Hadoop Jira.
That is it for Part 4. In Part 5 of this series, we will explore how to optimize logging and block reports.
Created on 01-19-2018 09:37 AM
The 04-Series article are very well written. Thanks Arpit for documenting them.
Created on 06-19-2018 09:55 PM
So many heavy calls inside critical RPC loops? That is very weird design decision.
Created on 05-23-2019 02:23 AM
this is really helpful
Created on 09-08-2021 02:43 PM
This whole series is really insightful and helpful!