Created 08-13-2018 07:08 PM
on the following answer , we have explain how to rotate gc.log ( logs ) for metrics collector service
https://community.hortonworks.com/questions/194534/automatic-deletion-of-rotated-gclog-files.html
Now we need to do the same on the following services
HDFS
KAFKA
what should be the approach on that services ?
Created 08-13-2018 10:03 PM
Just adding KAFKA_GC_LOG_OPTS="-XX:+UseGCLogFileRotation-XX:NumberOfGCLogFiles=5-XX:GCLogFileSize=2M" will not be sufficient - will have to use that variable during the service start.
Regarding HDFS:
In hadoop-env template wherever you find -XX:+PrintGCDetails parameter you can add -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=2M
For example:
From:
HADOOP_JOBTRACKER_OPTS="-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile={{hdfs_log_dir_prefix}}/$USER/hs_err_pid%p.log -XX:NewSize={{jtnode_opt_newsize}} -XX:MaxNewSize={{jtnode_opt_maxnewsize}} -Xloggc:{{hdfs_log_dir_prefix}}/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xmx{{jtnode_heapsize}} -Dhadoop.security.logger=INFO,DRFAS -Dmapred.audit.logger=INFO,MRAUDIT -Dhadoop.mapreduce.jobsummary.logger=INFO,JSA ${HADOOP_JOBTRACKER_OPTS}"
To:
HADOOP_JOBTRACKER_OPTS="-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile={{hdfs_log_dir_prefix}}/$USER/hs_err_pid%p.log -XX:NewSize={{jtnode_opt_newsize}} -XX:MaxNewSize={{jtnode_opt_maxnewsize}} -Xloggc:{{hdfs_log_dir_prefix}}/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=2M -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xmx{{jtnode_heapsize}} -Dhadoop.security.logger=INFO,DRFAS -Dmapred.audit.logger=INFO,MRAUDIT -Dhadoop.mapreduce.jobsummary.logger=INFO,JSA ${HADOOP_JOBTRACKER_OPTS}"
Created 08-13-2018 07:19 PM
You can follow same approach - those are JVM parameters and it should work for any service. you can add these parameters in env template in ambari for any of those services.
Created 08-13-2018 07:24 PM
the problem is that I not sure what is JVM parameters variable for HDFS and kafka
- which Variable I need to search ? if not defined should I add this variable ?
Created 08-13-2018 07:39 PM
this is what we have in kafka-end ( in ambari GUI )
#!/bin/bash # Set KAFKA specific environment variables here. # The java implementation to use. export JAVA_HOME={{java64_home}} export PATH=$PATH:$JAVA_HOME/bin export PID_DIR={{kafka_pid_dir}} export LOG_DIR={{kafka_log_dir}} export KAFKA_KERBEROS_PARAMS={{kafka_kerberos_params}} export JMX_PORT=9997 # Add kafka sink to classpath and related depenencies if [ -e "/usr/lib/ambari-metrics-kafka-sink/ambari-metrics-kafka-sink.jar" ]; then export CLASSPATH=$CLASSPATH:/usr/lib/ambari-metrics-kafka-sink/ambari-metrics-kafka-sink.jar export CLASSPATH=$CLASSPATH:/usr/lib/ambari-metrics-kafka-sink/lib/* fi if [ -f /etc/kafka/conf/kafka-ranger-env.sh ]; then . /etc/kafka/conf/kafka-ranger-env.sh fi
Created 08-13-2018 07:42 PM
this is what we have in hadoop-end from amabri GUI in HDFS service
hadoop-env-hdfs.txt
Created 08-13-2018 08:21 PM
@amarnath reddy pappu regarding my details above , can you suggest with which variable I can use in order to set the properties ? , of maybe I need to append variable that represented the - JVM parameters , if yes what is the name of the variable ?
Created 08-13-2018 08:33 PM
what we are thinking but we are not sure is that:
for kafka
KAFKA_GC_LOG_OPTS="-XX:+UseGCLogFileRotation-XX:NumberOfGCLogFiles=5-XX:GCLogFileSize=2M"
for hdfs
JAVA_OPTS="-XX:+UseGCLogFileRotation-XX:NumberOfGCLogFiles=5-XX:GCLogFileSize=2M"
Created 08-13-2018 08:44 PM
For HDFS to GC there would be below config/property
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps
Next to that you can append
like below
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=2M
Note: you need to have space between each parameter.
For Kafka - can you tell me how did you enable the GC logging? by default it may not have been enabled -
Created 08-13-2018 09:01 PM
Not understand about the HDFS , why not put all properties in variable as KAFKA_GC_LOG_OPTS="-XX:+UseGCLogFileRotation-XX:NumberOfGCLogFiles=5-XX:GCLogFileSize=2M"
Created 08-13-2018 10:03 PM
Just adding KAFKA_GC_LOG_OPTS="-XX:+UseGCLogFileRotation-XX:NumberOfGCLogFiles=5-XX:GCLogFileSize=2M" will not be sufficient - will have to use that variable during the service start.
Regarding HDFS:
In hadoop-env template wherever you find -XX:+PrintGCDetails parameter you can add -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=2M
For example:
From:
HADOOP_JOBTRACKER_OPTS="-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile={{hdfs_log_dir_prefix}}/$USER/hs_err_pid%p.log -XX:NewSize={{jtnode_opt_newsize}} -XX:MaxNewSize={{jtnode_opt_maxnewsize}} -Xloggc:{{hdfs_log_dir_prefix}}/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xmx{{jtnode_heapsize}} -Dhadoop.security.logger=INFO,DRFAS -Dmapred.audit.logger=INFO,MRAUDIT -Dhadoop.mapreduce.jobsummary.logger=INFO,JSA ${HADOOP_JOBTRACKER_OPTS}"
To:
HADOOP_JOBTRACKER_OPTS="-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile={{hdfs_log_dir_prefix}}/$USER/hs_err_pid%p.log -XX:NewSize={{jtnode_opt_newsize}} -XX:MaxNewSize={{jtnode_opt_maxnewsize}} -Xloggc:{{hdfs_log_dir_prefix}}/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=2M -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xmx{{jtnode_heapsize}} -Dhadoop.security.logger=INFO,DRFAS -Dmapred.audit.logger=INFO,MRAUDIT -Dhadoop.mapreduce.jobsummary.logger=INFO,JSA ${HADOOP_JOBTRACKER_OPTS}"