Created on 07-07-2016 04:13 AM - edited on 04-29-2024 02:41 AM by VidyaSargur
With HDFS HA, the NameNode is no longer a single point of failure in a Hadoop cluster. However the performance of a single NameNode can often limit the performance of jobs across the cluster. The Apache Hadoop community has made a number of NameNode scalability improvements. This series of articles (also see part 2, part 3, part 4 and part 5) explains how you can make configuration changes to help your NameNode scale better and also improve its diagnosability.
This article is for Hadoop administrators who are familiar with HDFS and its components. If you are using Ambari or Cloudera Manager you should know how to manage services and configurations.
The HDFS audit log is a separate log file that contains one entry for each user request. The audit log is a key resource for offline analysis of application traffic to find out users/jobs that generate excessive load on the NameNode. Careful analysis of the HDFS audit logs can also identify application bugs/misconfiguration.
Each audit log entry looks like the following:
2015-11-16 21:00:00,109 INFO FSNamesystem.audit: allowed=true ugi=bob (auth:SIMPLE)
ip=/000.000.0.0 cmd=listStatussrc=/app-logs/pcd_batch/application_1431545431771/ dst=null perm=null
Audit logging can be enabled via the Hadoop log4j.properties file (typically found in /etc/hadoop/conf, or under the "process" directory when using CM) with the following settings:
# hdfs audit logging hdfs.audit.logger=INFO,console log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=${hdfs.audit.logger} log4j.additivity.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=false log4j.appender.DRFAAUDIT=org.apache.log4j.DailyRollingFileAppender log4j.appender.DRFAAUDIT.File=${hadoop.log.dir}/hdfs-audit.log log4j.appender.DRFAAUDIT.layout=org.apache.log4j.PatternLayout log4j.appender.DRFAAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n log4j.appender.DRFAAUDIT.DatePattern=.yyyy-MM-dd
Also, check the NameNode startup options in hadoop-env.sh to ensure there is no conflicting override for the hdfs.audit.logger setting. The following is an example of a correct HADOOP_NAMENODE_OPTS setting.
export HADOOP_NAMENODE_OPTS="-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/$USER/hs_err_pid%p.log -XX:NewSize=8g -XX:MaxNewSize=8g -Xloggc:/var/log/hadoop/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms64g -Xmx64g -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT ${HADOOP_NAMENODE_OPTS}"
Note: If you enable HDFS audit logs, you should also enable Async Audit Logging as described in the next section.
Using an async log4j appender for the audit logs significantly improves the performance of the NameNode. This setting is critical for the NameNode to scale beyond 10,000 requests/second. Add the following to your hdfs-site.xml.
<property> <name>dfs.namenode.audit.log.async</name> <value>true</value> </property>
If you are managing your cluster with Ambari, this setting is already enabled by default. If you're using CDH, you'll want to add the property to "NameNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml". For a CDP cluster, this property is exposed in the UI and enabled by default.
The service RPC port gives the DataNodes a dedicated port to report their status via block reports and heartbeats. The port is also used by Zookeeper Failover Controllers for periodic health checks by the automatic failover logic. The port is never used by client applications hence it reduces RPC queue contention between client requests and DataNode messages.
For a non-HA cluster, the service RPC port can be enabled with the following setting in hdfs-site.xml (replacing mynamenode.example.com with the hostname or IP address of your NameNode). The port number can be something other than 8040.
<property> <name>dfs.namenode.servicerpc-address</name> <value>mynamenode.example.com:8040</value> </property>
For an HA cluster, the service RPC port can be enabled with settings like the following, replacing mycluster, nn1 and nn2 appropriately.
<property> <name>dfs.namenode.servicerpc-address.mycluster.nn1</name> <value>mynamenode1.example.com:8040</value> </property> <property> <name>dfs.namenode.servicerpc-address.mycluster.nn2</name> <value>mynamenode2.example.com:8040</value> </property>
If you have enabled Automatic Failover using ZooKeeper Failover Controllers, an additional step is required to reset the ZKFC state in ZooKeeper. Stop both the ZKFC processes and run the following command as the hdfs superuser.
hdfs zkfc –formatZK
Note: Changing the service RPC port settings requires a restart of the NameNodes, DataNodes and ZooKeeper Failover Controllers to take full effect. If you have NameNode HA setup, you can restart the NameNodes one at a time followed by a rolling restart of the remaining components to avoid cluster downtime.
Garbage Collection is important enough to merit its own article. Please refer to NameNode Garbage Collection Configuration: Best Practices and Rationale.
That's it for Part 1. In part 2, part 3, part 4 and part 5 we will look at a number of additional configuration settings to help your NameNode scale better.
Created on 09-07-2017 09:07 AM
About the whole no downtime thing
If you have enabled Automatic Failover using ZooKeeper Failover Controllers, an additional step is required to reset the ZKFC state in ZooKeeper. Stop both the ZKFC processes and run the following command as the hdfs super user.
I think some clarification would be nice
Stop the ZKFC and then format zkfc znode and after that restart NameNodes?
Or restart the NameNodes, then stop the ZKFC and format after that?
Created on 10-04-2018 09:53 AM
You can refer below guidance for configuring service port.