Community Articles

Find and share helpful community-sourced technical articles.
Celebrating as our community reaches 100,000 members! Thank you!
Labels (2)
Master Mentor

As we know that the logs are important however many times we see that the logs consumes a lots of disk space. So if we want to use the log4j provided approach to control the logging behavior in a much efficient way, then in that case we can take advantage of the "apache-log4j-extra" package. More information about the extra packages can be found in :

In this article we will see how to use the log compression (zip) feature of log4j to compress the NameNode logs on a daily basis automatically.

In order to achieve the same we will need the following Steps:

Step-1). As the extras features are not shipped with the default log4j implementation hence the users will need to download the "Apache Extras™ for Apache log4j" (Like: apache-log4j-extras) :

Example: For example download the jar "apache-log4j-extras-1.2.17.jar" and place it inside the Hadoop library location.


Step-2). Create a log4j appender like "ZIPRFA" using class "org.apache.log4j.rolling.RollingFileAppender" where we will define the "rollingPolicy". For more information about various Rolling Policies users can refer to :

- Login to Ambari and then in the HDFS advanced configuration, Add the following Appender in the "Advanced hdfs-log4j" some where at the end.

#### New Appender to Zip the Log Files Based on Daily Rotation
log4j.appender.ZIPRFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n

Step-3). Also we will need to make sure that the NameNode should use the above mentioned Appender then we will need to add the "HADOOP_NAMENODE_OPTS" to include the "-Dhadoop.root.logger=INFO,ZIPRFA" something like following:

export HADOOP_NAMENODE_OPTS="${SHARED_HADOOP_NAMENODE_OPTS} -XX:OnOutOfMemoryError=\"/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node\" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 ${HADOOP_NAMENODE_OPTS} -Dhadoop.root.logger=INFO,ZIPRFA"

Step-4). Now Restart the NameNode and double check sure that the "-Dhadoop.root.logger=INFO,ZIPRFA" property is added properly somewhere at the end. We can confirm the same using the "ps -ef | grep NameNode" output

hdfs     27497     1  3 07:07 ?        00:01:27 /usr/jdk64/jdk1.8.0_60/bin/java -Dproc_namenode -Xmx1024m -Dhdp.version= -Dhdp.version= -Dhdp.version= -Dhadoop.log.dir=/var/log/hadoop/hdfs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/ -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/ -Dhadoop.policy.file=hadoop-policy.xml -Dhdp.version= -Dhadoop.log.dir=/var/log/hadoop/hdfs -Dhadoop.log.file=hadoop-hdfs-namenode-jss1.openstacklocal.log -Dhadoop.home.dir=/usr/hdp/ -Dhadoop.root.logger=INFO,RFA -Djava.library.path=:/usr/hdp/ -Dhadoop.policy.file=hadoop-policy.xml -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/hdfs/gc.log-201608060706 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms1024m -Xmx1024m,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/hdfs/gc.log-201608060706 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms1024m -Xmx1024m,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/hdfs/gc.log-201608060706 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms1024m -Xmx1024m,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -Dhadoop.root.logger=INFO,ZIPRFA -Dhadoop.root.logger=INFO,ZIPRFA -Dhadoop.root.logger=INFO,ZIPRFA,RFAS org.apache.hadoop.hdfs.server.namenode.NameNode

Step-5). Now as soon as the date changes we should be able to see that the old NameNode log file got zipped as following:

[root@jayhost hdfs]# /var/log/hadoop/hdfs
[root@jayhost hdfs]# ls -lart *.gz
-rw-r--r--. 1 hdfs hadoop    32453 Aug  5 06:32 hadoop-hdfs-namenode-jayhost.openstacklocal.log-.20160804.log.gz

There is a small error in the conversion pattern - the n at the end is missing. It should be

log4j.appender.ZIPRFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n

I found that my change to the HDAOOP_NAMENODE_OPTS was not taking effect when editing with Ambari. I resolved this with the following:

When using Ambari to edit the hadoop-env template I added: -Dhadoop.root.logger=INFO,ZIPRFA"

export HADOOP_NAMENODE_OPTS="${SHARED_HADOOP_NAMENODE_OPTS} -XX:OnOutOfMemoryError=\"/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node\" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 ${HADOOP_NAMENODE_OPTS} -Dhadoop.root.logger=INFO,ZIPRFA"

Do not add to the section of the template that says: {% if java_version < 8 %} unless you are using Java version 1.7 or below.

You need to add it to: the section {% else %}

After adding and restarting HDFS my NameNode logs were rotating and zipping correctly.

Hi @Jay SenSharma Do you have any similar articles for Hive log compression?

@Jay SenSharma Can you please let me know for knox gateway.out log? how can we compress them because above solution will compress on one type of the logs. Please provide your answer for it. Thanks

Hi Jay,

Thanks for the post.

I would like to know what is the maximum period or how many days older logs do we need to keep in the system?