Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar

Scenario: The spark log4j properties(Ambari > Spark > Configs) are not configured to log to a file. When running a job in yarn-client mode, the driver logs are spilled on the console. For long running jobs, it can be difficult to capture the driver logs due to various reasons like the user may lose connection with the terminal, or may have closed the terminal etc.

The driver log is a useful artifact if we have to investigate a job failure.

In such scenarios, it is better to have the spark driver log to a file instead of console.

Here are the steps:

  1. Place a driver_log4j.properties file in a certain location (say /tmp) on the machine where you will be submitting the job in yarn-client mode

Contents of driver_log4j.properties

#Set everything to be logged to the file

log4j.rootCategory=INFO,FILE

log4j.appender.console=org.apache.log4j.ConsoleAppender

log4j.appender.console.target=System.err

log4j.appender.console.layout=org.apache.log4j.PatternLayout

log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

log4j.appender.FILE=org.apache.log4j.RollingFileAppender

log4j.appender.FILE.File=/tmp/SparkDriver.log

log4j.appender.FILE.ImmediateFlush=true

log4j.appender.FILE.Threshold=debug

log4j.appender.FILE.Append=true

log4j.appender.FILE.MaxFileSize=500MB

log4j.appender.FILE.MaxBackupIndex=10

log4j.appender.FILE.layout=org.apache.log4j.PatternLayout

log4j.appender.FILE.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

#Settings to quiet third party logs that are too verbose

log4j.logger.org.eclipse.jetty=WARN

log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR

log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO

log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO

Change the value of log4j.appender.FILE.File as needed.

2. Add the following to the spark-submit command so that it picks the above log4j properties and makes the driver log to a file.

--driver-java-options "-Dlog4j.configuration=file:/tmp/driver_log4j.properties"

Example

spark-submit --driver-java-options "-Dlog4j.configuration=file:/tmp/driver_log4j.properties"
--class org.apache.spark.examples.JavaSparkPi --master yarn-client --num-executors 3
--driver-memory 512m --executor-memory 512m --executor-cores 1 spark-examples*.jar 10

3. Now, once you submit this new command, spark driver will log at the location specified by log4j.appender.FILE.File in driver_log4j.properties. Thus, it will log to /tmp/SparkDriver.log

Note:

The Executor logs can always be fetched from Spark History Server UI whether you are running the job in yarn-client or yarn-cluster mode.

a.Go to Spark History Server UI

b.Click on the App ID

c.Navigate to Executors tab

d.The Executors page will list the link to stdout and stderr logs

52,629 Views