New Contributor
Posts: 3
Registered: ‎10-20-2017

Spark File Logger in Yarn Mode

[ Edited ]

I want to create a custom logger that writes from messages from executors in a specific folder in a cluster node.

I have edited my file like this:



#Changed the root logger level to Warning in order not to flood the console with messagges
log4j.rootLogger=${root.logger} root.logger=WARN,console log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n shell.log.level=WARN$exprTyper=INFO$SparkILoopInterpreter=INFO log4j.logger.parquet=ERROR${shell.log.level}${shell.log.level} #My logger to write usefull messages in a local file log4j.logger.jobLogger=INFO, RollingAppenderU log4j.appender.RollingAppenderU=org.apache.log4j.DailyRollingFileAppender log4j.appender.RollingAppenderU.File=/var/log/sparkU.log log4j.appender.RollingAppenderU.DatePattern='.'yyyy-MM-dd log4j.appender.RollingAppenderU.layout=org.apache.log4j.PatternLayout log4j.appender.RollingAppenderU.layout.ConversionPattern=[%p] %d %c %M - %m%n log4j.appender.fileAppender.MaxFileSize=1MB log4j.appender.fileAppender.MaxBackupIndex=1


So I want while using the jobLogger to save a file in /var/log/sparkU.log.

I created a small program in Python



from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext, SparkSession
from pyspark.sql.types import *

spark = SparkSession \
        .builder \
	.master("yarn") \
        .appName("test custom logging") \
        .config("spark.some.config.option", "some-value") \

log4jLogger = 

log = log4jLogger.LogManager.getLogger("jobLogger")"Info message")

log.warn("Warn message")

log.error("Error message")

print 'Print output'


and I submit it like this



 /usr/bin/spark-submit --master yarn --deploy-mode client /mypath/


When I use deploy mode client the file is written at the desired place. When I use deploy mode cluster the local file is not written but the messages can be found in YARN log. But in YARN logs for both modes I take this error also (output for spark cluster mode from YARN logs) :   


log4j:ERROR setFile(null,true) call failed. /var/log/sparkU.log (Permission denied)
	at Method)
	at org.apache.log4j.FileAppender.setFile(
	at org.apache.log4j.FileAppender.activateOptions(
	at org.apache.log4j.DailyRollingFileAppender.activateOptions(
	at org.apache.log4j.config.PropertySetter.activate(
	at org.apache.log4j.config.PropertySetter.setProperties(
	at org.apache.log4j.config.PropertySetter.setProperties(
	at org.apache.log4j.PropertyConfigurator.parseAppender(
	at org.apache.log4j.PropertyConfigurator.parseCategory(
	at org.apache.log4j.PropertyConfigurator.parseCatsAndRenderers(
	at org.apache.log4j.PropertyConfigurator.doConfigure(
	at org.apache.log4j.PropertyConfigurator.doConfigure(
	at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(
	at org.apache.log4j.LogManager.<clinit>(
	at org.apache.spark.internal.Logging$class.initializeLogging(Logging.scala:117)
	at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:102)
	at org.apache.spark.deploy.yarn.ApplicationMaster$.initializeLogIfNecessary(ApplicationMaster.scala:746)
	at org.apache.spark.internal.Logging$class.log(Logging.scala:46)
	at org.apache.spark.deploy.yarn.ApplicationMaster$.log(ApplicationMaster.scala:746)
	at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:761)
	at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
log4j:ERROR Either File or DatePattern options are not set for appender [RollingAppenderU].
18/01/15 12:13:00 WARN spark.SparkContext: Support for Java 7 is deprecated as of Spark 2.0.0
18/01/15 12:13:02 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
18/01/15 12:13:04 INFO jobLogger: Info message
18/01/15 12:13:04 WARN jobLogger: Warn message
18/01/15 12:13:04 ERROR jobLogger: Error message


So I have two questions


-Why the first error message is printed ( I suspect that this come from the application's master logger but how can I stop it to print these error? I want only executors to use the file logger.


-Is it possible to use cluster mode and still be able to write at a specific file in one of the machines? I was wondering If I can somehow enter a path like host:port/myPath/spark.log and all the executors would write in that file in one of the machines.


Thanks in advance for any response