- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on 09-18-2017 06:52 PM
Scenario: The spark log4j properties(Ambari > Spark > Configs) are not configured to log to a file. When running a job in yarn-client mode, the driver logs are spilled on the console. For long running jobs, it can be difficult to capture the driver logs due to various reasons like the user may lose connection with the terminal, or may have closed the terminal etc.
The driver log is a useful artifact if we have to investigate a job failure.
In such scenarios, it is better to have the spark driver log to a file instead of console.
Here are the steps:
- Place a driver_log4j.properties file in a certain location (say /tmp) on the machine where you will be submitting the job in yarn-client mode
Contents of driver_log4j.properties
#Set everything to be logged to the file log4j.rootCategory=INFO,FILE log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n log4j.appender.FILE=org.apache.log4j.RollingFileAppender log4j.appender.FILE.File=/tmp/SparkDriver.log log4j.appender.FILE.ImmediateFlush=true log4j.appender.FILE.Threshold=debug log4j.appender.FILE.Append=true log4j.appender.FILE.MaxFileSize=500MB log4j.appender.FILE.MaxBackupIndex=10 log4j.appender.FILE.layout=org.apache.log4j.PatternLayout log4j.appender.FILE.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n #Settings to quiet third party logs that are too verbose log4j.logger.org.eclipse.jetty=WARN log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
Change the value of log4j.appender.FILE.File as needed.
2. Add the following to the spark-submit command so that it picks the above log4j properties and makes the driver log to a file.
--driver-java-options "-Dlog4j.configuration=file:/tmp/driver_log4j.properties"
Example
spark-submit --driver-java-options "-Dlog4j.configuration=file:/tmp/driver_log4j.properties" --class org.apache.spark.examples.JavaSparkPi --master yarn-client --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 spark-examples*.jar 10
3. Now, once you submit this new command, spark driver will log at the location specified by log4j.appender.FILE.File in driver_log4j.properties. Thus, it will log to /tmp/SparkDriver.log
Note:
The Executor logs can always be fetched from Spark History Server UI whether you are running the job in yarn-client or yarn-cluster mode.
a.Go to Spark History Server UI
b.Click on the App ID
c.Navigate to Executors tab
d.The Executors page will list the link to stdout and stderr logs