Created on 07-06-2016 08:40 PM
We have seem many incidents of overloaded HDFS namenode due to 1) misconfigurations or 2) “bad” MR jobs or Hive queries that create large amount of RPC requests in a short period of time. There are quite a few features that have been introduced in HDP 2.3/2.4 to protect HDFS namenode. This article summarize the deployment steps of these features with an incomplete list of known issues and possible solutions for them.
Enable async audit logging by setting "dfs.namenode.audit.log.async" to true in hdfs-site.xml. This can minimize the impact of audit log I/Os on namenode performance.
<property> <name>dfs.namenode.audit.log.async</name> <value>true</value> </property>
Configuring a separate service RPC port can improve the responsiveness of the NameNode by allowing DataNode and client requests to be processed via separate RPC queues. Datanode and all other services should be connected to the new service RPC address and clients connect to the well known addresses specified by dfs.namenode.rpc-address.
Adding a service RPC port to an HA cluster with automatic failover via ZKFCs (with/wo Kerberos) requires some additional steps as follows:
<property> <name>dfs.namenode.servicerpc-address.mycluster.nn1</name> <value>nn1.example.com:8040</value> </property> <property> <name>dfs.namenode.servicerpc-address.mycluster.nn2</name> <value>nn2.example.com:8040</value> </property>
2. If the cluster is not Kerberos enabled, skip this step.
If the cluster is kerberos enabled, create two new hdfs_jass.conf files for nn1 and nn2 and copy them to /etc/hadoop/conf/hdfs_jaas.conf, respectively
nn1:
Client { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true storeKey=true useTicketCache=false keyTab="/etc/security/keytabs/nn.service.keytab" principal="nn/c6401.ambari.apache.org@EXAMPLE.COM";};
nn2:
Client { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true storeKey=true useTicketCache=false keyTab="/etc/security/keytabs/nn.service.keytab" principal="nn/c6402.ambari.apache.org@EXAMPLE.COM";};
Add the following to hadoop-env.sh
export HADOOP_NAMENODE_OPTS="-Dzookeeper.sasl.client=true -Dzookeeper.sasl.client.username=zookeeper -Djava.security.auth.login.config=/etc/hadoop/conf/hdfs_jaas.conf -Dzookeeper.sasl.clientconfig=Client ${HADOOP_NAMENODE_OPTS}"
3. Restart NameNodes
4. Restart DataNodes to connect to the new NameNode service RPC port instead of the NameNode client RPC port .
5. Stop the ZKFC processes on both NameNodes
6. Run the following command to reset the ZKFC state in ZooKeeper
hdfs zkfc -formatZK
Known issues:
1. Without step 6 you will see the following exception after ZKFC restart.
java.lang.RuntimeException:Mismatched address stored in ZK forNameNode
2. Without step 2 in a Kerberos enabled HA cluster, you will see the following exception when running step 6.
16/03/23 03:30:53 INFO ha.ActiveStandbyElector: Recursively deleting /hadoop-ha/hdp64ha from ZK...16/03/23 03:30:53 ERROR ha.ZKFailoverController: Unable to clear zk parent znodejava.io.IOException: Couldn't clear parent znode /hadoop-ha/hdp64haat org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:380)at org.apache.hadoop.ha.ZKFailoverController.formatZK(ZKFailoverController.java:267)at org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:212)at org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:61)at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:172)at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:168)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:360)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:442)at org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:168)at org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:183) Caused by: org.apache.zookeeper.KeeperException$NotEmptyException: KeeperErrorCode = Directory not empty for /hadoop-ha/hdp64haat org.apache.zookeeper.KeeperException.create(KeeperException.java:125)at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)at org.apache.zookeeper.ZKUtil.deleteRecursive(ZKUtil.java:54)at org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:375)at org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:372)at org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1041)at org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:372) ... 11 more
HDFS-9311 allows using a separate RPC address to isolate health checks and liveness from client RPC port which could be exhausted due to “bad” jobs. Here is an example to configure this feature in a HA cluster.
<property> <name>dfs.namenode.lifeline.rpc-address.mycluster.nn1</name> <value>nn1.example.com:8050</value> </property> <property> <name>dfs.namenode.lifeline.rpc-address.mycluster.nn2</name> <value>nn1.example.com:8050</value> </property>