Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar

Introduction

This article continues where part 2 left off. It describes how to enable two new features that make the HDFS NameNode more responsive to high RPC request loads.

Audience

This article is for Apache Hadoop administrators who are familiar with HDFS and its components. If you are using Ambari you should know how to manage services and configurations with Ambari. It is assumed that you have read part 1 and part 2 of the article.

RPC Congestion Control

Warning: You must first enable the service RPC port (described in part 1) and restart services so the service RPC setting is effective. Failure to do so will break DataNode-NameNode communication.

RPC Congestion Control is a relatively new feature added by the Apache Hadoop Community to help Hadoop Services respond more predictably under high load (see Apache Hadoop Jira HADOOP-10597). In part 2, we discussed how RPC queue overflow can cause request timeouts, and eventually job failures.

This problem can be mitigated if the NameNode sends an explicit signal back to the client when its RPC queue is full. Instead of waiting for a request that may never complete, the client throttles itself by resubmitting the request with an exponentially increasing delay. If you are familiar with the Transmission Control Protocol, this is a similar to how senders react when they detect network congestion.

The feature can be enabled with the following setting in your core-site.xml file. Replace 8020 with your NameNode RPC port number if it is different. Do not enable this setting for the Service RPC port or the DataNode lifeline port.

  <property>
    <name>ipc.8020.backoff.enable</name>
    <value>true</value>
  </property>

You should not enable this setting unless you are running one of HDP 2.3.6+ or HDP 2.4.2+; or if a Hortonworks engineer has recommended you enable it after examining your cluster.

RPC FairCallQueue

Warning: You must first enable the service RPC port (described in part 1) and restart services so the service RPC setting is effective. Failure to do so will break DataNode-NameNode communication.

The RPC FairCallQueue replaces the single RPC queue with multiple prioritized queues (see Apache Hadoop Jira HADOOP-10282). The RPC server maintains a history of recent requests grouped by user. It places incoming requests into an appropriate queue based on the user's history. RPC handler threads will dequeue requests from higher priority queues with a higher probability.

FairCallQueue complements RPC congestion control very well and works best when you enable both features together. FairCallQueue can be enabled with the following setting in your core-site.xml. Replace 8020 with your NameNode RPC port if it is different. Do not enable this setting for the Service RPC port or the DataNode lifeline port.

  <property>
    <name>ipc.8020.callqueue.impl</name>
    <value>org.apache.hadoop.ipc.FairCallQueue</value>
  </property>

Conclusion

That is it for part 3. In part 4 of this article we look at how to avoid a few common NameNode performance pitfalls.

21,405 Views
Comments
avatar
New Contributor

Hi @Arpit Agarwal, In HADOOP-10597 the earliest version mentioned with the fix is 2.7.4. Is the HDP version recommended in the post correct? I'm running HDP 2.6.1 with Hadoop 2.7.3 and I would like to confirm if this parameter can be enabled or not. Thanks!