Posts: 46
Registered: ‎11-03-2014

Log Rolling for Spark Jobs on YARN?

CDH 5.2, installed with Cloudera Manager + Parcels


I have a Spark Streaming job run in yarn-client mode. I would like to have logs rotated. Currently, I am getting stdout and stderr log files in yarn container log directories, e.g,


# pwd
# ls -l 
-rw-r----- 1 myuser yarn 209493 Sep 25 14:06 stderr
-rw-r----- 1 myuser yarn 0 Sep 25 11:36 stdout

Can I control the rolling of these files? I checked /etc/hadoop/conf.cloudera.yarn/, /etc/spark/conf.cloudera.spark/ and /var/run/cloudera-scm-agent/process/*-yarn-NODEMANAGER/ None include configuration for such files (stderr and stdout). What am I missing?


Also, my application is using a like

log4j.rootLogger=INFO, RollingAppender

Is it a good idea to enable the commented line to write to ${}/MyApp.log (container log directory)? I tried by got permission problem. Will "yarn logs" include the custom log files?



Cloudera Employee
Posts: 322
Registered: ‎01-16-2014

Re: Log Rolling for Spark Jobs on YARN?

How to do this is documented here: running on yarn. You need to pass in a custom file.

With rolling logs you will most likely lose the yarn log tracking and aggregation. I am not sure that this will properly work. The container will most likely keep pointing to the base file and never move to the rolled version or you will only be able to ever track the current one.



Posts: 46
Registered: ‎11-03-2014

Re: Log Rolling for Spark Jobs on YARN?

Thanks for reply. Actually I tried the settings but not getting desired result. Retrying in "trial and error" manner, it seems spark.driver.extraJavaOptions and ${} does not work with CDH5.2 Spark / YARN. 


Now I end up using both --files  (for driver) and spark.driver.extraJavaOptions (for executor) to specify log4j properties files.. Files are rolling, but not going to container directory.

Can any one confirm whether ${} work with CDH5.2?