Support Questions
Find answers, ask questions, and share your expertise

how to rotate gc files for ambari services

on the following answer , we have explain how to rotate gc.log ( logs ) for metrics collector service

https://community.hortonworks.com/questions/194534/automatic-deletion-of-rotated-gclog-files.html

Now we need to do the same on the following services

HDFS

KAFKA

what should be the approach on that services ?

Michael-Bronson
1 ACCEPTED SOLUTION

@Michael Bronson

Just adding KAFKA_GC_LOG_OPTS="-XX:+UseGCLogFileRotation-XX:NumberOfGCLogFiles=5-XX:GCLogFileSize=2M" will not be sufficient - will have to use that variable during the service start.

Regarding HDFS:

In hadoop-env template wherever you find -XX:+PrintGCDetails parameter you can add -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=2M

For example:

From:

HADOOP_JOBTRACKER_OPTS="-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile={{hdfs_log_dir_prefix}}/$USER/hs_err_pid%p.log -XX:NewSize={{jtnode_opt_newsize}} -XX:MaxNewSize={{jtnode_opt_maxnewsize}} -Xloggc:{{hdfs_log_dir_prefix}}/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xmx{{jtnode_heapsize}} -Dhadoop.security.logger=INFO,DRFAS -Dmapred.audit.logger=INFO,MRAUDIT -Dhadoop.mapreduce.jobsummary.logger=INFO,JSA ${HADOOP_JOBTRACKER_OPTS}"

To:

HADOOP_JOBTRACKER_OPTS="-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile={{hdfs_log_dir_prefix}}/$USER/hs_err_pid%p.log -XX:NewSize={{jtnode_opt_newsize}} -XX:MaxNewSize={{jtnode_opt_maxnewsize}} -Xloggc:{{hdfs_log_dir_prefix}}/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=2M -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xmx{{jtnode_heapsize}} -Dhadoop.security.logger=INFO,DRFAS -Dmapred.audit.logger=INFO,MRAUDIT -Dhadoop.mapreduce.jobsummary.logger=INFO,JSA ${HADOOP_JOBTRACKER_OPTS}"

View solution in original post

11 REPLIES 11

@Michael Bronson

You can follow same approach - those are JVM parameters and it should work for any service. you can add these parameters in env template in ambari for any of those services.

  1. -XX:+UseGCLogFileRotation-XX:NumberOfGCLogFiles=5-XX:GCLogFileSize=2M

@amarnath reddy pappu

the problem is that I not sure what is JVM parameters variable for HDFS and kafka

- which Variable I need to search ? if not defined should I add this variable ?

Michael-Bronson

this is what we have in kafka-end ( in ambari GUI )

#!/bin/bash
# Set KAFKA specific environment variables here.
# The java implementation to use.
export JAVA_HOME={{java64_home}}
export PATH=$PATH:$JAVA_HOME/bin
export PID_DIR={{kafka_pid_dir}}
export LOG_DIR={{kafka_log_dir}}
export KAFKA_KERBEROS_PARAMS={{kafka_kerberos_params}}
export JMX_PORT=9997
# Add kafka sink to classpath and related depenencies
if [ -e "/usr/lib/ambari-metrics-kafka-sink/ambari-metrics-kafka-sink.jar" ]; then
  export CLASSPATH=$CLASSPATH:/usr/lib/ambari-metrics-kafka-sink/ambari-metrics-kafka-sink.jar
  export CLASSPATH=$CLASSPATH:/usr/lib/ambari-metrics-kafka-sink/lib/*
fi
if [ -f /etc/kafka/conf/kafka-ranger-env.sh ]; then
. /etc/kafka/conf/kafka-ranger-env.sh
fi
Michael-Bronson

this is what we have in hadoop-end from amabri GUI in HDFS service

hadoop-env-hdfs.txt
Michael-Bronson

@amarnath reddy pappu regarding my details above , can you suggest with which variable I can use in order to set the properties ? , of maybe I need to append variable that represented the - JVM parameters , if yes what is the name of the variable ?

Michael-Bronson

what we are thinking but we are not sure is that:

for kafka

KAFKA_GC_LOG_OPTS="-XX:+UseGCLogFileRotation-XX:NumberOfGCLogFiles=5-XX:GCLogFileSize=2M"

for hdfs

JAVA_OPTS="-XX:+UseGCLogFileRotation-XX:NumberOfGCLogFiles=5-XX:GCLogFileSize=2M"
Michael-Bronson

@Michael Bronson

For HDFS to GC there would be below config/property

-XX:+PrintGCDetails -XX:+PrintGCTimeStamps

Next to that you can append

like below

-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=2M

Note: you need to have space between each parameter.

For Kafka - can you tell me how did you enable the GC logging? by default it may not have been enabled -

Not understand about the HDFS , why not put all properties in variable as KAFKA_GC_LOG_OPTS="-XX:+UseGCLogFileRotation-XX:NumberOfGCLogFiles=5-XX:GCLogFileSize=2M"
Michael-Bronson

@Michael Bronson

Just adding KAFKA_GC_LOG_OPTS="-XX:+UseGCLogFileRotation-XX:NumberOfGCLogFiles=5-XX:GCLogFileSize=2M" will not be sufficient - will have to use that variable during the service start.

Regarding HDFS:

In hadoop-env template wherever you find -XX:+PrintGCDetails parameter you can add -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=2M

For example:

From:

HADOOP_JOBTRACKER_OPTS="-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile={{hdfs_log_dir_prefix}}/$USER/hs_err_pid%p.log -XX:NewSize={{jtnode_opt_newsize}} -XX:MaxNewSize={{jtnode_opt_maxnewsize}} -Xloggc:{{hdfs_log_dir_prefix}}/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xmx{{jtnode_heapsize}} -Dhadoop.security.logger=INFO,DRFAS -Dmapred.audit.logger=INFO,MRAUDIT -Dhadoop.mapreduce.jobsummary.logger=INFO,JSA ${HADOOP_JOBTRACKER_OPTS}"

To:

HADOOP_JOBTRACKER_OPTS="-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile={{hdfs_log_dir_prefix}}/$USER/hs_err_pid%p.log -XX:NewSize={{jtnode_opt_newsize}} -XX:MaxNewSize={{jtnode_opt_maxnewsize}} -Xloggc:{{hdfs_log_dir_prefix}}/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=2M -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xmx{{jtnode_heapsize}} -Dhadoop.security.logger=INFO,DRFAS -Dmapred.audit.logger=INFO,MRAUDIT -Dhadoop.mapreduce.jobsummary.logger=INFO,JSA ${HADOOP_JOBTRACKER_OPTS}"

for HDFS now its more clear , but for kafka , where I need to set the "-XX:+UseGCLogFileRotation-XX:NumberOfGCLogFiles=5-XX:GCLogFileSize=2M"  ?
Michael-Bronson

@Michael Bronson

For Kafka its not even enabled GC logging I guess - hence you can ignore that part. but you can set that like below.

In kafka-env section,

From

export KAFKA_KERBEROS_PARAMS="-Djavax.security.auth.useSubjectCredsOnly=false

to:

export KAFKA_KERBEROS_PARAMS="-Djavax.security.auth.useSubjectCredsOnly=false -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=2M

If you satisfied with answers then please mark my comment as correct answer. this will help others as well.

; ;