Scenario: The spark log4j
properties(Ambari > Spark > Configs) are not configured to log to a file.
When running a job in yarn-client mode, the driver logs are spilled on the console.
For long running jobs, it can be difficult to capture the driver logs due to
various reasons like the user may lose connection with the terminal, or may
have closed the terminal etc.
The
driver log is a useful artifact if we have to investigate a job failure.
In
such scenarios, it is better to have the spark driver log to a file instead of
console.
Here
are the steps:
Place a driver_log4j.properties file in a certain location (say /tmp)
on the machine where you will be submitting the job in yarn-client mode
Contents of driver_log4j.properties
#Set everything to be logged to the file
log4j.rootCategory=INFO,FILE
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
log4j.appender.FILE=org.apache.log4j.RollingFileAppender
log4j.appender.FILE.File=/tmp/SparkDriver.log
log4j.appender.FILE.ImmediateFlush=true
log4j.appender.FILE.Threshold=debug
log4j.appender.FILE.Append=true
log4j.appender.FILE.MaxFileSize=500MB
log4j.appender.FILE.MaxBackupIndex=10
log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
log4j.appender.FILE.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
#Settings to quiet third party logs that are too verbose
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
Change the value of log4j.appender.FILE.File as needed.
2. Add the following to the spark-submit command so that it picks the
above log4j properties and makes the driver log to a file.
3. Now, once you submit this new command, spark driver will log at the
location specified by log4j.appender.FILE.File
in driver_log4j.properties. Thus, it will log to /tmp/SparkDriver.log
Note:
The Executor logs can always be fetched
from Spark History Server UI whether you are running the job in yarn-client or
yarn-cluster mode.
a.Go to Spark History Server UI
b.Click on the App ID
c.Navigate to Executors tab
d.The Executors page will list the link to stdout and stderr logs