Created on 08-05-2016 11:24 AM
As we know that the logs are important however many times we see that the logs consumes a lots of disk space. So if we want to use the log4j provided approach to control the logging behavior in a much efficient way, then in that case we can take advantage of the "apache-log4j-extra" package. More information about the extra packages can be found in :
https://logging.apache.org/log4j/extras/
In this article we will see how to use the log compression (zip) feature of log4j to compress the NameNode logs on a daily basis automatically.
In order to achieve the same we will need the following Steps:
Step-1). As the extras features are not shipped with the default log4j implementation hence the users will need to download the "Apache Extras™ for Apache log4j" (Like: apache-log4j-extras) : https://logging.apache.org/log4j/extras/download.html
Example: For example download the jar "apache-log4j-extras-1.2.17.jar" and place it inside the Hadoop library location.
/usr/hdp/2.4.2.0-258/hadoop/lib/apache-log4j-extras-1.2.17.jar
Step-2). Create a log4j appender like "ZIPRFA" using class "org.apache.log4j.rolling.RollingFileAppender" where we will define the "rollingPolicy". For more information about various Rolling Policies users can refer to : https://logging.apache.org/log4j/extras/apidocs/org/apache/log4j/rolling/
- Login to Ambari and then in the HDFS advanced configuration, Add the following Appender in the "Advanced hdfs-log4j" some where at the end.
#### New Appender to Zip the Log Files Based on Daily Rotation log4j.appender.ZIPRFA=org.apache.log4j.rolling.RollingFileAppender log4j.appender.ZIPRFA.File=${hadoop.log.dir}/${hadoop.log.file} log4j.appender.ZIPRFA.layout=org.apache.log4j.PatternLayout log4j.appender.ZIPRFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n log4j.appender.ZIPRFA.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy log4j.appender.ZIPRFA.rollingPolicy.ActiveFileName=${hadoop.log.dir}/${hadoop.log.file} log4j.appender.ZIPRFA.rollingPolicy.FileNamePattern=${hadoop.log.dir}/${hadoop.log.file}-.%d{yyyyMMdd}.log.gz
Step-3). Also we will need to make sure that the NameNode should use the above mentioned Appender then we will need to add the "HADOOP_NAMENODE_OPTS" to include the "-Dhadoop.root.logger=INFO,ZIPRFA" something like following:
export HADOOP_NAMENODE_OPTS="${SHARED_HADOOP_NAMENODE_OPTS} -XX:OnOutOfMemoryError=\"/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node\" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 ${HADOOP_NAMENODE_OPTS} -Dhadoop.root.logger=INFO,ZIPRFA"
Step-4). Now Restart the NameNode and double check sure that the "-Dhadoop.root.logger=INFO,ZIPRFA" property is added properly somewhere at the end. We can confirm the same using the "ps -ef | grep NameNode" output
hdfs 27497 1 3 07:07 ? 00:01:27 /usr/jdk64/jdk1.8.0_60/bin/java -Dproc_namenode -Xmx1024m -Dhdp.version=2.4.2.0-258 -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/var/log/hadoop/hdfs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.4.2.0-258/hadoop -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.4.2.0-258/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.4.2.0-258/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhdp.version=2.4.2.0-258 -Dhadoop.log.dir=/var/log/hadoop/hdfs -Dhadoop.log.file=hadoop-hdfs-namenode-jss1.openstacklocal.log -Dhadoop.home.dir=/usr/hdp/2.4.2.0-258/hadoop -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,RFA -Djava.library.path=:/usr/hdp/2.4.2.0-258/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.4.2.0-258/hadoop/lib/native:/usr/hdp/2.4.2.0-258/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.4.2.0-258/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/hdfs/gc.log-201608060706 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms1024m -Xmx1024m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/hdfs/gc.log-201608060706 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms1024m -Xmx1024m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/hdfs/gc.log-201608060706 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms1024m -Xmx1024m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -Dhadoop.root.logger=INFO,ZIPRFA -Dhadoop.root.logger=INFO,ZIPRFA -Dhadoop.root.logger=INFO,ZIPRFA -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.namenode.NameNode
Step-5). Now as soon as the date changes we should be able to see that the old NameNode log file got zipped as following:
[root@jayhost hdfs]# /var/log/hadoop/hdfs [root@jayhost hdfs]# ls -lart *.gz -rw-r--r--. 1 hdfs hadoop 32453 Aug 5 06:32 hadoop-hdfs-namenode-jayhost.openstacklocal.log-.20160804.log.gz
Created on 11-25-2016 11:23 AM
There is a small error in the conversion pattern - the n at the end is missing. It should be
log4j.appender.ZIPRFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
Created on 12-29-2016 03:00 PM
I found that my change to the HDAOOP_NAMENODE_OPTS was not taking effect when editing with Ambari. I resolved this with the following:
When using Ambari to edit the hadoop-env template I added: -Dhadoop.root.logger=INFO,ZIPRFA"
export HADOOP_NAMENODE_OPTS="${SHARED_HADOOP_NAMENODE_OPTS} -XX:OnOutOfMemoryError=\"/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node\" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 ${HADOOP_NAMENODE_OPTS} -Dhadoop.root.logger=INFO,ZIPRFA"
Do not add to the section of the template that says: {% if java_version < 8 %} unless you are using Java version 1.7 or below.
You need to add it to: the section {% else %}
After adding and restarting HDFS my NameNode logs were rotating and zipping correctly.
Created on 06-13-2017 11:08 AM
Hi @Jay SenSharma Do you have any similar articles for Hive log compression?
Created on 09-15-2017 02:31 PM
@Jay SenSharma Can you please let me know for knox gateway.out log? how can we compress them because above solution will compress on one type of the logs. Please provide your answer for it. Thanks
Created on 01-22-2019 06:33 PM
Hi Jay,
Thanks for the post.
I would like to know what is the maximum period or how many days older logs do we need to keep in the system?