Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Log Rolling for Spark Jobs on YARN?

Highlighted

Log Rolling for Spark Jobs on YARN?

Rising Star

CDH 5.2, installed with Cloudera Manager + Parcels

 

I have a Spark Streaming job run in yarn-client mode. I would like to have logs rotated. Currently, I am getting stdout and stderr log files in yarn container log directories, e.g,

 

# pwd
/var/log/hadoop-yarn/container/application.../container.../*
# ls -l 
-rw-r----- 1 myuser yarn 209493 Sep 25 14:06 stderr
-rw-r----- 1 myuser yarn 0 Sep 25 11:36 stdout


Can I control the rolling of these files? I checked /etc/hadoop/conf.cloudera.yarn/log4j.properties, /etc/spark/conf.cloudera.spark/log4j.properties and /var/run/cloudera-scm-agent/process/*-yarn-NODEMANAGER/log4j.properties. None include configuration for such files (stderr and stdout). What am I missing?

 

Also, my application is using a log4j.properties like

log4j.rootLogger=INFO, RollingAppender
log4j.appender.RollingAppender=org.apache.log4j.DailyRollingFileAppender
log4j.appender.RollingAppender.File=/tmp/MyApp.log
#log4j.appender.RollingAppender.File=${spark.yarn.app.container.log.dir}/MyApp.log
log4j.appender.RollingAppender.DatePattern='.'yyyyMMdd

Is it a good idea to enable the commented line to write to ${spark.yarn.app.container.log.dir}/MyApp.log (container log directory)? I tried by got permission problem. Will "yarn logs" include the custom log files?

 

Thanks.

2 REPLIES 2

Re: Log Rolling for Spark Jobs on YARN?

Super Collaborator

How to do this is documented here: running on yarn. You need to pass in a custom log4j.properties file.

With rolling logs you will most likely lose the yarn log tracking and aggregation. I am not sure that this will properly work. The container will most likely keep pointing to the base file and never move to the rolled version or you will only be able to ever track the current one.

 

Wilfred

Re: Log Rolling for Spark Jobs on YARN?

Rising Star

Thanks for reply. Actually I tried the settings but not getting desired result. Retrying in "trial and error" manner, it seems spark.driver.extraJavaOptions and ${spark.yarn.app.container.log.dir} does not work with CDH5.2 Spark / YARN. 

 

Now I end up using both --files  (for driver) and spark.driver.extraJavaOptions (for executor) to specify log4j properties files.. Files are rolling, but not going to container directory.

Can any one confirm whether ${spark.yarn.app.container.log.dir} work with CDH5.2?