Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (2)
Super Mentor

As we know that the logs are important however many times we see that the logs consumes a lots of disk space. So if we want to use the log4j provided approach to control the logging behavior in a much efficient way, then in that case we can take advantage of the "apache-log4j-extra" package. More information about the extra packages can be found in :

https://logging.apache.org/log4j/extras/

In this article we will see how to use the log compression (zip) feature of log4j to compress the NameNode logs on a daily basis automatically.

In order to achieve the same we will need the following Steps:

Step-1). As the extras features are not shipped with the default log4j implementation hence the users will need to download the "Apache Extras™ for Apache log4j" (Like: apache-log4j-extras) : https://logging.apache.org/log4j/extras/download.html

Example: For example download the jar "apache-log4j-extras-1.2.17.jar" and place it inside the Hadoop library location.

/usr/hdp/2.4.2.0-258/hadoop/lib/apache-log4j-extras-1.2.17.jar

Step-2). Create a log4j appender like "ZIPRFA" using class "org.apache.log4j.rolling.RollingFileAppender" where we will define the "rollingPolicy". For more information about various Rolling Policies users can refer to : https://logging.apache.org/log4j/extras/apidocs/org/apache/log4j/rolling/

- Login to Ambari and then in the HDFS advanced configuration, Add the following Appender in the "Advanced hdfs-log4j" some where at the end.

#### New Appender to Zip the Log Files Based on Daily Rotation
log4j.appender.ZIPRFA=org.apache.log4j.rolling.RollingFileAppender
log4j.appender.ZIPRFA.File=${hadoop.log.dir}/${hadoop.log.file}
log4j.appender.ZIPRFA.layout=org.apache.log4j.PatternLayout
log4j.appender.ZIPRFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
log4j.appender.ZIPRFA.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
log4j.appender.ZIPRFA.rollingPolicy.ActiveFileName=${hadoop.log.dir}/${hadoop.log.file}
log4j.appender.ZIPRFA.rollingPolicy.FileNamePattern=${hadoop.log.dir}/${hadoop.log.file}-.%d{yyyyMMdd}.log.gz

Step-3). Also we will need to make sure that the NameNode should use the above mentioned Appender then we will need to add the "HADOOP_NAMENODE_OPTS" to include the "-Dhadoop.root.logger=INFO,ZIPRFA" something like following:

export HADOOP_NAMENODE_OPTS="${SHARED_HADOOP_NAMENODE_OPTS} -XX:OnOutOfMemoryError=\"/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node\" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 ${HADOOP_NAMENODE_OPTS} -Dhadoop.root.logger=INFO,ZIPRFA"

Step-4). Now Restart the NameNode and double check sure that the "-Dhadoop.root.logger=INFO,ZIPRFA" property is added properly somewhere at the end. We can confirm the same using the "ps -ef | grep NameNode" output

hdfs     27497     1  3 07:07 ?        00:01:27 /usr/jdk64/jdk1.8.0_60/bin/java -Dproc_namenode -Xmx1024m -Dhdp.version=2.4.2.0-258 -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/var/log/hadoop/hdfs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.4.2.0-258/hadoop -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.4.2.0-258/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.4.2.0-258/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhdp.version=2.4.2.0-258 -Dhadoop.log.dir=/var/log/hadoop/hdfs -Dhadoop.log.file=hadoop-hdfs-namenode-jss1.openstacklocal.log -Dhadoop.home.dir=/usr/hdp/2.4.2.0-258/hadoop -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,RFA -Djava.library.path=:/usr/hdp/2.4.2.0-258/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.4.2.0-258/hadoop/lib/native:/usr/hdp/2.4.2.0-258/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.4.2.0-258/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/hdfs/gc.log-201608060706 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms1024m -Xmx1024m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/hdfs/gc.log-201608060706 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms1024m -Xmx1024m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/hdfs/gc.log-201608060706 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms1024m -Xmx1024m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -Dhadoop.root.logger=INFO,ZIPRFA -Dhadoop.root.logger=INFO,ZIPRFA -Dhadoop.root.logger=INFO,ZIPRFA -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.namenode.NameNode

Step-5). Now as soon as the date changes we should be able to see that the old NameNode log file got zipped as following:

[root@jayhost hdfs]# /var/log/hadoop/hdfs
[root@jayhost hdfs]# ls -lart *.gz
-rw-r--r--. 1 hdfs hadoop    32453 Aug  5 06:32 hadoop-hdfs-namenode-jayhost.openstacklocal.log-.20160804.log.gz
8,003 Views
Comments
New Contributor

There is a small error in the conversion pattern - the n at the end is missing. It should be

log4j.appender.ZIPRFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
New Contributor

I found that my change to the HDAOOP_NAMENODE_OPTS was not taking effect when editing with Ambari. I resolved this with the following:

When using Ambari to edit the hadoop-env template I added: -Dhadoop.root.logger=INFO,ZIPRFA"

export HADOOP_NAMENODE_OPTS="${SHARED_HADOOP_NAMENODE_OPTS} -XX:OnOutOfMemoryError=\"/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node\" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 ${HADOOP_NAMENODE_OPTS} -Dhadoop.root.logger=INFO,ZIPRFA"

Do not add to the section of the template that says: {% if java_version < 8 %} unless you are using Java version 1.7 or below.

You need to add it to: the section {% else %}

After adding and restarting HDFS my NameNode logs were rotating and zipping correctly.

New Contributor

Hi @Jay SenSharma Do you have any similar articles for Hive log compression?

Not applicable

@Jay SenSharma Can you please let me know for knox gateway.out log? how can we compress them because above solution will compress on one type of the logs. Please provide your answer for it. Thanks

New Contributor

Hi Jay,

Thanks for the post.

I would like to know what is the maximum period or how many days older logs do we need to keep in the system?

Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
1 of 1
Last update:
‎08-05-2016 11:24 AM
Updated by:
 
Contributors
Top Kudoed Authors