Community Articles

VidyaSargur · ‎10-12-2021

Introduction

This article is the final part in the series Scaling the Namenode (See part 1, part 2, part 3 and part 4)

In part 4 we discussed about monitoring Namenode Logs for Excessive Skews.

In this part, we will look at a few optimizations around logging, access check, and block reports.

Audience

This article is for Hadoop administrators who are familiar with HDFS and its components.

Audit log specific operations only when debug is enabled.

By default, the following property is set to blank so none of the Namenode operations are restricted from making an entry into Audit log.

Operations like getfileinfo results in fetching the metadata associated with a file, and in a large/read-heavy cluster, it can generate too much audit log. So, it is recommended to audit log getfileinfo only when audit log debug is enabled.

Change in hdfs-site.xml

<property>
  <name>dfs.namenode.audit.log.debug.cmdlist<\name>
  <value>getfileinfo<\value>
  <description>A comma separated list of NameNode commands that are written to the HDFS 
   namenode audit log only if the audit log level is debug.
  <\description>
<\property>

In Cloudera Manager you can add the property under "NameNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml".

Further, the BlockStateChange and the StateChange related logging are really only useful when those operations have failed i.e. the log entry for those classes is ERROR. At the default INFO level, these two classes generate a large amount of log entry in the Namenode logs. You can reduce the frequency of logging by adding the following lines in log4j.properties file in your Hadoop configurations. Under Cloudera Manager these properties can be added under "NameNode Logging Advanced Configuration Snippet (Safety Valve)".

log4j.logger.BlockStateChange=ERROR
log4j.logger.org.apache.hadoop.hdfs.StateChange=ERROR

Avoid recursive call to external authorizer for getContentSummary

getContentSummary is an expensive operation in general. It becomes even more expensive in a secured environment where the security is managed by an external component like Apache Ranger as the permission check is performed via a recursive call to check for all descendants in a path. HDFS-14112 introduced an improvement to make just one call with subaccess, because often they don't have to evaluate for each and every component of the path.

Change in hdfs-site.xml

<property>
  <name>dfs.permissions.ContentSummary.subAccess</name>
  <value>false</value>
  <description>
    If "true", the ContentSummary permission checking will use subAccess.
    If "false", the ContentSummary permission checking will NOT use subAccess.
    subAccess means using recursion to check the access of all descendants.
  </description>
</property>

Again in Cloudera Manager place the property under "NameNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml"

It is recommended to set this property to true so as to use subAccess.

Note: This improvement is only available in CDP releases, the older CDH/HDP releases do not have this improvement so adding this configuration on CDH/HDP releases is not recommended.

Optimizing Block Reports

In busy and large clusters (say 200 Datanodes), it is very important to not overwhelm NameNode with too frequent full block reports from the datanodes. If the NameNodes are already degraded, the block reports add further stress on the NameNodes. The NameNodes might be so slow to process the block reports that you would eventually see messages like 'Block report queue is full' in the NameNode logs.

It is interesting to note that while the default block report queue size is set to 1024, we can see this 'Block report queue is full' message even during a NameNode startup in what we call a block report flood event and also when your NameNode's RPC processing time is too high to indicate a severely degraded NameNode, thereby having a backlog of reports to process and eventually overflowing the queue.

While the block report queue size is configurable and you could essentially increase the queue size, a better approach is to optimize the way the data nodes send blocks reports.

We recommend a 3 prong approach to change the following in hdfs-site.xml:

Split block report by volume (Default value 1000000)

<property>
<name>dfs.blockreport.split.threshold</name>
<value>0</value>
<description>
    If the number of blocks on the DataNode is below this threshold then it will send block reports for all Storage Directories in a single message. If the number of blocks exceeds this threshold then the DataNode will send block reports for each Storage Directory in separate messages. Set to zero to always split.
</description>
</property>

Reduce full block report frequency from a default 6 hours to 12 hours

<property>
<name>dfs.blockreport.intervalMsec</name>
<value>43200000</value>
<description>
Determines block reporting interval in milliseconds.
</description>
</property>

Batch incremental reports (Default value 0 disables batching)

<property>
<name>dfs.blockreport.incremental.intervalMsec</name>
<value> 100 </value>
<description>
If set to a positive integer, the value in ms to wait between sending incremental block reports from the Datanode to the Namenode.
</description>
</property>

All 3 belong under "NameNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml" in Cloudera Manager.

Conclusion

This wraps up the series on getting the best performance possible out of your NameNode. We hope these tips will keep your cluster running at its best and your users happy.

Cloudera Community