Community Articles

dineshc · ‎09-18-2017

Scenario: The spark log4j properties(Ambari > Spark > Configs) are not configured to log to a file. When running a job in yarn-client mode, the driver logs are spilled on the console. For long running jobs, it can be difficult to capture the driver logs due to various reasons like the user may lose connection with the terminal, or may have closed the terminal etc.

The driver log is a useful artifact if we have to investigate a job failure.

In such scenarios, it is better to have the spark driver log to a file instead of console.

Here are the steps:

Place a driver_log4j.properties file in a certain location (say /tmp) on the machine where you will be submitting the job in yarn-client mode

Contents of driver_log4j.properties

#Set everything to be logged to the file

log4j.rootCategory=INFO,FILE

log4j.appender.console=org.apache.log4j.ConsoleAppender

log4j.appender.console.target=System.err

log4j.appender.console.layout=org.apache.log4j.PatternLayout

log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

log4j.appender.FILE=org.apache.log4j.RollingFileAppender

log4j.appender.FILE.File=/tmp/SparkDriver.log

log4j.appender.FILE.ImmediateFlush=true

log4j.appender.FILE.Threshold=debug

log4j.appender.FILE.Append=true

log4j.appender.FILE.MaxFileSize=500MB

log4j.appender.FILE.MaxBackupIndex=10

log4j.appender.FILE.layout=org.apache.log4j.PatternLayout

log4j.appender.FILE.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

#Settings to quiet third party logs that are too verbose

log4j.logger.org.eclipse.jetty=WARN

log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR

log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO

log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO

Change the value of log4j.appender.FILE.File as needed.

2. Add the following to the spark-submit command so that it picks the above log4j properties and makes the driver log to a file.

--driver-java-options "-Dlog4j.configuration=file:/tmp/driver_log4j.properties"

Example

spark-submit --driver-java-options "-Dlog4j.configuration=file:/tmp/driver_log4j.properties"
--class org.apache.spark.examples.JavaSparkPi --master yarn-client --num-executors 3
--driver-memory 512m --executor-memory 512m --executor-cores 1 spark-examples*.jar 10

3. Now, once you submit this new command, spark driver will log at the location specified by log4j.appender.FILE.File in driver_log4j.properties. Thus, it will log to /tmp/SparkDriver.log

Note:

The Executor logs can always be fetched from Spark History Server UI whether you are running the job in yarn-client or yarn-cluster mode.

a.Go to Spark History Server UI

b.Click on the App ID

c.Navigate to Executors tab

d.The Executors page will list the link to stdout and stderr logs

Cloudera Community

Community Articles

How to : capture Spark Driver and Executor Logs in yarn-client mode

Apache Spark

Spark on YARN - Executor Resource Allocation Optim...

Config log4j in Spark - Driver Logs

Spark cluster: Launched executors less than specif...

spark yarn-client mode failed due to OutOfMemoryEr...

Spark Python Supportability Matrix

log4j.properties override spark executor

How to Spark Roll Event Log Files in CDP

Spark and Java versions Supportability Matrix

Parsing Apache Log Files with Spark

Spark job fails in cluster mode.