Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

how to rotate the HDFS gc logs on data node machines

avatar

hi all

we are trying to rotate the gc logs on workers machines according to the following

we add the line

-XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=2K

inside HADOOP_DATANODE_OPTS variable

in HDFS --> Advanced hadoop-env

as the following:

<br>export HADOOP_DATANODE_OPTS="-server -XX:ParallelGCThreads=4 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/$USER/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=2K -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms{{dtnode_heapsize}} -Xmx{{dtnode_heapsize}} -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT ${HADOOP_DATANODE_OPTS}"

but we get on worker machine ( data node under /var/log/hadoop/hdfs ) , the follwing

-rw-r--r--  1 hdfs hadoop   1035 Aug 26 20:33 hadoop-hdfs-datanode-worker01.sys76.com.log.10
-rw-r--r--  1 hdfs hadoop   1035 Aug 26 20:35 hadoop-hdfs-datanode-worker01.sys76.com.log.9
-rw-r--r--  1 hdfs hadoop   1035 Aug 26 20:36 hadoop-hdfs-datanode-worker01.sys76.com.log.8
-rw-r--r--  1 hdfs hadoop   1504 Aug 26 20:37 hadoop-hdfs-datanode-worker01.sys76.com.log.7
-rw-r--r--  1 hdfs hadoop   1035 Aug 26 20:38 hadoop-hdfs-datanode-worker01.sys76.com.log.6
-rw-r--r--  1 hdfs hadoop   1035 Aug 26 20:40 hadoop-hdfs-datanode-worker01.sys76.com.log.5
-rw-r--r--  1 hdfs hadoop   1035 Aug 26 20:41 hadoop-hdfs-datanode-worker01.sys76.com.log.4
-rw-r--r--  1 hdfs hadoop   1035 Aug 26 20:43 hadoop-hdfs-datanode-worker01.sys76.com.log.3
-rw-r--r--  1 hdfs hadoop   1035 Aug 26 20:44 hadoop-hdfs-datanode-worker01.sys76.com.log.2
-rw-r--r--  1 hdfs hadoop   1035 Aug 26 20:46 hadoop-hdfs-datanode-worker01.sys76.com.log.1
-rw-r--r--  1 hdfs hadoop    727 Aug 26 20:46 hadoop-hdfs-datanode-worker01.sys76.com.out.4
-rw-r--r--  1 hdfs hadoop   3905 Aug 26 20:46 gc.log-201808262031.0.current
-rw-r--r--  1 hdfs hadoop    727 Aug 26 20:46 hadoop-hdfs-datanode-worker01.sys76.com.out.3
-rw-r--r--. 1 hdfs hadoop 101346 Aug 26 21:53 SecurityAuth.audit
-rw-r--r--  1 hdfs hadoop   7488 Aug 27 06:25 gc.log-201808262046
-rw-r--r--  1 hdfs hadoop    727 Aug 27 06:28 hadoop-hdfs-datanode-worker01.sys76.com.out.2
-rw-r--r--  1 hdfs hadoop   3651 Aug 27 06:28 gc.log-201808270625.0.current
-rw-r--r--  1 hdfs hadoop    727 Aug 27 06:29 hadoop-hdfs-datanode-worker01.sys76.com.out.1
-rw-r--r--  1 hdfs hadoop    727 Aug 27 06:29 hadoop-hdfs-datanode-worker01.sys76.com.out
-rw-r--r--  1 hdfs hadoop   2940 Aug 27 06:29 gc.log-201808270629.0.current
-rw-r--r--  1 hdfs hadoop 378069 Aug 27 06:34 hadoop-hdfs-datanode-worker01.sys76.com.log

why gc.log.... are not rotated and comes with "0.current"

what is wrong with my configuration?

** we also try the following but this isn't worked

In hadoop-env template wherever you find -XX:+PrintGCDetails parameter you can add -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=2M

For example:

From:

HADOOP_JOBTRACKER_OPTS="-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile={{hdfs_log_dir_prefix}}/$USER/hs_err_pid%p.log -XX:NewSize={{jtnode_opt_newsize}} -XX:MaxNewSize={{jtnode_opt_maxnewsize}} -Xloggc:{{hdfs_log_dir_prefix}}/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xmx{{jtnode_heapsize}} -Dhadoop.security.logger=INFO,DRFAS -Dmapred.audit.logger=INFO,MRAUDIT -Dhadoop.mapreduce.jobsummary.logger=INFO,JSA ${HADOOP_JOBTRACKER_OPTS}"

To:

HADOOP_JOBTRACKER_OPTS="-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile={{hdfs_log_dir_prefix}}/$USER/hs_err_pid%p.log -XX:NewSize={{jtnode_opt_newsize}} -XX:MaxNewSize={{jtnode_opt_maxnewsize}} -Xloggc:{{hdfs_log_dir_prefix}}/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails <strong>-XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=2M </strong>-XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xmx{{jtnode_heapsize}} -Dhadoop.security.logger=INFO,DRFAS -Dmapred.audit.logger=INFO,MRAUDIT -Dhadoop.mapreduce.jobsummary.logger=INFO,JSA ${HADOOP_JOBTRACKER_OPTS}"
Michael-Bronson
1 ACCEPTED SOLUTION

avatar
Master Mentor

@Michael Bronson

If you just want to quickly fill the GC log to see the log rotation then try this instead of using "cat" comamand to add content inside the actual current GC log file.

# su - hdfs
# /usr/jdk64/jdk1.8.0_112/bin/jcmd `cat /var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid` GC.run
.
# SYNTAX: # $JAVA_HOME/bin/jcmd $DATANODE_PID GC.run

Use the "jcmd" utility to explicitly trigger the GC many times. So after running this command around 20-30 times you will see that the GC log of data node gets rotated and in the current GC log you will see some messages like:

2018-08-27 08:11:07 GC log file has reached the maximum size. Saved as /var/log/hadoop/hdfs/gc.log-201808270803.0

.

I see the log rotations works as expected as following:

-rw-r--r--. 1 hdfs hadoop     10617 Aug 27 08:12 gc.log-201808270803.1
-rw-r--r--. 1 hdfs hadoop     10615 Aug 27 08:13 gc.log-201808270803.2
-rw-r--r--. 1 hdfs hadoop     10616 Aug 27 08:13 gc.log-201808270803.3
-rw-r--r--. 1 hdfs hadoop      4001 Aug 27 08:13 gc.log-201808270803.4.current

.

View solution in original post

20 REPLIES 20

avatar
Master Mentor

@Michael Bronson

Yes, it can be done.