Community Articles

xyao · ‎07-06-2016

We have seem many incidents of overloaded HDFS namenode due to 1) misconfigurations or 2) “bad” MR jobs or Hive queries that create large amount of RPC requests in a short period of time. There are quite a few features that have been introduced in HDP 2.3/2.4 to protect HDFS namenode. This article summarize the deployment steps of these features with an incomplete list of known issues and possible solutions for them.

Enable Async Audit Logging
Dedicated Service RPC Port
Dedicated Lifeline RPC Port for HA
Enable FairCallQueue on Client RPC Port
Enable RPC Client Backoff on Client RPC port
Enable RPC Caller Context to track the “bad” jobs
Enable Response time based backoff with DecayedRpcScheduler
Check JMX for namenode client RPC call queue length and average queue time
Check JMX for namenode DecayRpcScheduler when FCQ is enabled
NNtop (HDFS-6982)

1. Enable Async Audit Logging

Enable async audit logging by setting "dfs.namenode.audit.log.async" to true in hdfs-site.xml. This can minimize the impact of audit log I/Os on namenode performance.

<property>  
  <name>dfs.namenode.audit.log.async</name>  
  <value>true</value>
</property>

2. Dedicated Service RPC Port

Configuring a separate service RPC port can improve the responsiveness of the NameNode by allowing DataNode and client requests to be processed via separate RPC queues. Datanode and all other services should be connected to the new service RPC address and clients connect to the well known addresses specified by dfs.namenode.rpc-address.

Adding a service RPC port to an HA cluster with automatic failover via ZKFCs (with/wo Kerberos) requires some additional steps as follows:

Add the following settings to hdfs-site.xml.

<property>
  <name>dfs.namenode.servicerpc-address.mycluster.nn1</name>
  <value>nn1.example.com:8040</value>
</property>
<property>
  <name>dfs.namenode.servicerpc-address.mycluster.nn2</name>
  <value>nn2.example.com:8040</value>
</property>

2. If the cluster is not Kerberos enabled, skip this step.

If the cluster is kerberos enabled, create two new hdfs_jass.conf files for nn1 and nn2 and copy them to /etc/hadoop/conf/hdfs_jaas.conf, respectively

nn1:

Client { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true storeKey=true useTicketCache=false keyTab="/etc/security/keytabs/nn.service.keytab" principal="nn/c6401.ambari.apache.org@EXAMPLE.COM";};

nn2:

Client { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true storeKey=true useTicketCache=false keyTab="/etc/security/keytabs/nn.service.keytab" principal="nn/c6402.ambari.apache.org@EXAMPLE.COM";};

Add the following to hadoop-env.sh

export HADOOP_NAMENODE_OPTS="-Dzookeeper.sasl.client=true -Dzookeeper.sasl.client.username=zookeeper -Djava.security.auth.login.config=/etc/hadoop/conf/hdfs_jaas.conf -Dzookeeper.sasl.clientconfig=Client ${HADOOP_NAMENODE_OPTS}"

3. Restart NameNodes

4. Restart DataNodes to connect to the new NameNode service RPC port instead of the NameNode client RPC port .

5. Stop the ZKFC processes on both NameNodes

6. Run the following command to reset the ZKFC state in ZooKeeper

hdfs zkfc -formatZK

Known issues:

1. Without step 6 you will see the following exception after ZKFC restart.

java.lang.RuntimeException:Mismatched address stored in ZK forNameNode

2. Without step 2 in a Kerberos enabled HA cluster, you will see the following exception when running step 6.

16/03/23 03:30:53 INFO ha.ActiveStandbyElector: Recursively deleting /hadoop-ha/hdp64ha from ZK...16/03/23 03:30:53 ERROR ha.ZKFailoverController: Unable to clear zk parent znodejava.io.IOException: Couldn't clear parent znode /hadoop-ha/hdp64haat org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:380)at org.apache.hadoop.ha.ZKFailoverController.formatZK(ZKFailoverController.java:267)at org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:212)at org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:61)at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:172)at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:168)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:360)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:442)at org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:168)at org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:183)

Caused by: org.apache.zookeeper.KeeperException$NotEmptyException: KeeperErrorCode = Directory not empty for /hadoop-ha/hdp64haat org.apache.zookeeper.KeeperException.create(KeeperException.java:125)at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)at org.apache.zookeeper.ZKUtil.deleteRecursive(ZKUtil.java:54)at org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:375)at org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:372)at org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1041)at org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:372)
... 11 more

3. Dedicated Lifeline RPC Port for HA

HDFS-9311 allows using a separate RPC address to isolate health checks and liveness from client RPC port which could be exhausted due to “bad” jobs. Here is an example to configure this feature in a HA cluster.

<property>  
<name>dfs.namenode.lifeline.rpc-address.mycluster.nn1</name>
<value>nn1.example.com:8050</value> 
</property>

<property>
  <name>dfs.namenode.lifeline.rpc-address.mycluster.nn2</name>
  <value>nn1.example.com:8050</value>
</property>

Cloudera Community

Community Articles

HDFS Namenode Protection Checklist

Apache Hadoop

1. Enable Async Audit Logging

2. Dedicated Service RPC Port

3. Dedicated Lifeline RPC Port for HA

HDFS checklist for identifying missing/corrupt blo...

Scaling the HDFS NameNode (part 5)

Accelerating Replication and Decommissioning in HD...

Scaling the HDFS NameNode (part 1)

Scaling the HDFS NameNode (part 2)

Restore data from datanode after doing hdfs nameno...

Namenode Best Pratices / Performance tunning check...

Scaling the HDFS NameNode (part 4) - Avoiding Perf...

Scaling the HDFS NameNode (part 3) - RPC scalabili...

HDFS Data Protection？