Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

how to rotate the HDFS gc logs on data node machines

avatar

hi all

we are trying to rotate the gc logs on workers machines according to the following

we add the line

-XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=2K

inside HADOOP_DATANODE_OPTS variable

in HDFS --> Advanced hadoop-env

as the following:

<br>export HADOOP_DATANODE_OPTS="-server -XX:ParallelGCThreads=4 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/$USER/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=2K -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms{{dtnode_heapsize}} -Xmx{{dtnode_heapsize}} -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT ${HADOOP_DATANODE_OPTS}"

but we get on worker machine ( data node under /var/log/hadoop/hdfs ) , the follwing

-rw-r--r--  1 hdfs hadoop   1035 Aug 26 20:33 hadoop-hdfs-datanode-worker01.sys76.com.log.10
-rw-r--r--  1 hdfs hadoop   1035 Aug 26 20:35 hadoop-hdfs-datanode-worker01.sys76.com.log.9
-rw-r--r--  1 hdfs hadoop   1035 Aug 26 20:36 hadoop-hdfs-datanode-worker01.sys76.com.log.8
-rw-r--r--  1 hdfs hadoop   1504 Aug 26 20:37 hadoop-hdfs-datanode-worker01.sys76.com.log.7
-rw-r--r--  1 hdfs hadoop   1035 Aug 26 20:38 hadoop-hdfs-datanode-worker01.sys76.com.log.6
-rw-r--r--  1 hdfs hadoop   1035 Aug 26 20:40 hadoop-hdfs-datanode-worker01.sys76.com.log.5
-rw-r--r--  1 hdfs hadoop   1035 Aug 26 20:41 hadoop-hdfs-datanode-worker01.sys76.com.log.4
-rw-r--r--  1 hdfs hadoop   1035 Aug 26 20:43 hadoop-hdfs-datanode-worker01.sys76.com.log.3
-rw-r--r--  1 hdfs hadoop   1035 Aug 26 20:44 hadoop-hdfs-datanode-worker01.sys76.com.log.2
-rw-r--r--  1 hdfs hadoop   1035 Aug 26 20:46 hadoop-hdfs-datanode-worker01.sys76.com.log.1
-rw-r--r--  1 hdfs hadoop    727 Aug 26 20:46 hadoop-hdfs-datanode-worker01.sys76.com.out.4
-rw-r--r--  1 hdfs hadoop   3905 Aug 26 20:46 gc.log-201808262031.0.current
-rw-r--r--  1 hdfs hadoop    727 Aug 26 20:46 hadoop-hdfs-datanode-worker01.sys76.com.out.3
-rw-r--r--. 1 hdfs hadoop 101346 Aug 26 21:53 SecurityAuth.audit
-rw-r--r--  1 hdfs hadoop   7488 Aug 27 06:25 gc.log-201808262046
-rw-r--r--  1 hdfs hadoop    727 Aug 27 06:28 hadoop-hdfs-datanode-worker01.sys76.com.out.2
-rw-r--r--  1 hdfs hadoop   3651 Aug 27 06:28 gc.log-201808270625.0.current
-rw-r--r--  1 hdfs hadoop    727 Aug 27 06:29 hadoop-hdfs-datanode-worker01.sys76.com.out.1
-rw-r--r--  1 hdfs hadoop    727 Aug 27 06:29 hadoop-hdfs-datanode-worker01.sys76.com.out
-rw-r--r--  1 hdfs hadoop   2940 Aug 27 06:29 gc.log-201808270629.0.current
-rw-r--r--  1 hdfs hadoop 378069 Aug 27 06:34 hadoop-hdfs-datanode-worker01.sys76.com.log

why gc.log.... are not rotated and comes with "0.current"

what is wrong with my configuration?

** we also try the following but this isn't worked

In hadoop-env template wherever you find -XX:+PrintGCDetails parameter you can add -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=2M

For example:

From:

HADOOP_JOBTRACKER_OPTS="-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile={{hdfs_log_dir_prefix}}/$USER/hs_err_pid%p.log -XX:NewSize={{jtnode_opt_newsize}} -XX:MaxNewSize={{jtnode_opt_maxnewsize}} -Xloggc:{{hdfs_log_dir_prefix}}/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xmx{{jtnode_heapsize}} -Dhadoop.security.logger=INFO,DRFAS -Dmapred.audit.logger=INFO,MRAUDIT -Dhadoop.mapreduce.jobsummary.logger=INFO,JSA ${HADOOP_JOBTRACKER_OPTS}"

To:

HADOOP_JOBTRACKER_OPTS="-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile={{hdfs_log_dir_prefix}}/$USER/hs_err_pid%p.log -XX:NewSize={{jtnode_opt_newsize}} -XX:MaxNewSize={{jtnode_opt_maxnewsize}} -Xloggc:{{hdfs_log_dir_prefix}}/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails <strong>-XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=2M </strong>-XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xmx{{jtnode_heapsize}} -Dhadoop.security.logger=INFO,DRFAS -Dmapred.audit.logger=INFO,MRAUDIT -Dhadoop.mapreduce.jobsummary.logger=INFO,JSA ${HADOOP_JOBTRACKER_OPTS}"
Michael-Bronson
1 ACCEPTED SOLUTION

avatar
Master Mentor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
20 REPLIES 20

avatar
Master Mentor

@Michael Bronson

I see that you are setting the value to 2K as -XX:GCLogFileSize=2K

However as per the JVM option the minimum value for log rotation should be greater than 8KB, Can you please try with 8KB greater value.

The size of the log file at which point the log will be rotated, must be >= 8K.

https://www.oracle.com/technetwork/articles/java/vmoptions-jsp-140102.html

avatar
Master Mentor

@Michael Bronson

Also after setting the value for this parameter to >8K please make sure that the correct value is being reflected to the JVM options when you run the "ps -ef | grep $PROCESS" command

avatar

@Jay

we still get the same results ( we set 10K ) - as gc.log-201808270653.0.current , gc.log-201808270656.0.current

-rw-r--r--  1 hdfs hadoop   3859 Aug 27 06:53 gc.log-201808270629.0.current
-rw-r--r--  1 hdfs hadoop    727 Aug 27 06:53 hadoop-hdfs-datanode-worker01.sys76.com.out.1
-rw-r--r--  1 hdfs hadoop   3653 Aug 27 06:56 gc.log-201808270653.0.current
-rw-r--r--  1 hdfs hadoop    727 Aug 27 06:56 hadoop-hdfs-datanode-worker01.sys76.com.out
-rw-r--r--  1 hdfs hadoop 518292 Aug 27 06:56 hadoop-hdfs-datanode-worker01.sys76.com.log
-rw-r--r--  1 hdfs hadoop   3658 Aug 27 06:56 gc.log-201808270656.0.current
Michael-Bronson

avatar

and we verify 10K it reflect from proceess

 ps -ef | grep 10K
hdfs     109919      1 18 06:56 ?        00:00:25 /usr/jdk64/jdk1.8.0_112/bin/java -Dproc_datanode -Xmx1024m -Dhdp.version=2.6.4.0-91 -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/var/log/hadoop/hdfs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.6.4.0-91/hadoop -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.6.4.0-91/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.6.4.0-91/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhdp.version=2.6.4.0-91 -Dhadoop.log.dir=/var/log/hadoop/hdfs -Dhadoop.log.file=hadoop-hdfs-datanode-worker01.ipmassanalyt183.com.log -Dhadoop.home.dir=/usr/hdp/2.6.4.0-91/hadoop -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,RFA -Djava.library.path=:/usr/hdp/2.6.4.0-91/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.6.4.0-91/hadoop/lib/native:/usr/hdp/2.6.4.0-91/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.6.4.0-91/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -server -XX:ParallelGCThreads=4 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/hdfs/gc.log-201808270656 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=10K -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms1024m -Xmx1024m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -server -XX:ParallelGCThreads=4 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/hdfs/gc.log-201808270656 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=10K -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms1024m -Xmx1024m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -server -XX:ParallelGCThreads=4 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/hdfs/gc.log-201808270656 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=10K -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms1024m -Xmx1024m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.datanode.DataNode
Michael-Bronson

avatar
Master Mentor

@Michael Bronson

I see that your GC log file is around "3658" (around 3K) currently so it should get rotated after 10K.

avatar

@Jay another quastion can we set the "

  1. -XX:+PrintGCTimeStamps-XX:+UseGCLogFileRotation-XX:NumberOfGCLogFiles=5-XX:GCLogFileSize=10K" , in the end of the line? instead i the middle ?

example

<br>
HADOOP_DATANODE_OPTS="-server -XX:ParallelGCThreads=4 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/$USER/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails  -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms{{dtnode_heapsize}} -Xmx{{dtnode_heapsize}} -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT ${HADOOP_DATANODE_OPTS} -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=2K"
Michael-Bronson

avatar
Master Mentor

@Michael Bronson

As per JVM specification if we have duplicate JVM flags set for a process then the parameter which is set at the end will take precedence. Hence it is always better to set our own custom values at the end of the JVM options (instead of middle), Else there are chances that our values might get overridden.

So better to set the JVM parameters after the default parameters.

as following: hadoop-env templte (inside {% else %} block because our JDK might be JDK8)

      export HADOOP_DATANODE_OPTS="-server -XX:ParallelGCThreads=4 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/$USER/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms{{dtnode_heapsize}} -Xmx{{dtnode_heapsize}} -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT ${HADOOP_DATANODE_OPTS} -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=10K -verbose:gc"

avatar

I will check it when it will be more then 10K

Michael-Bronson

avatar

@Jay , I see that the log still 4K

 du -sh * | grep  gc.log-201808270656.0.current
4.0K gc.log-201808270656.0.current

is it possible to increase this log in order to see if it will rotate ?

Michael-Bronson