Reply
New Contributor
Posts: 5
Registered: ‎04-12-2018

Where are the logfiles for Spark2 Executors?

[ Edited ]

I have a very simple Spark Stream processing application and can't figure out where the log messages are. Here is the code

 

def createContext(loggerName):
    sc = SparkContext(appName="Compute Streaming Stats")

    sc.setLogLevel("INFO")

    log4jLogger = sc._jvm.org.apache.log4j
    scLogger = log4jLogger.LogManager.getLogger(loggerName)

    # stream for 180 seconds
    ssc = StreamingContext(sc, 180)

    kafkaStream = KafkaUtils.createStream(ssc, \
    "my.server:2181", \
    "compute-streaming-stats",\
    {"test":1})

    parsed = kafkaStream.map(lambda v: json.loads(v[1]))

    # Count number of tweets in the batch
    count_this_batch = kafkaStream.count()

    scLogger.info(count_this_batch)
    scLogger.info(parsed)
    
    return ssc

#end createContext
    
if __name__ == "__main__":

    ## Log setup
    loggerName = __name__
    dtmStamp = datetime.now().strftime('%Y_%m_%d_%H_%M_%S')
    logPath='/home/<username>/logs/' + os.path.basename(__file__) + '-' + dtmStamp + '.log'

    logger = logging.getLogger(loggerName)
    logger.setLevel(logging.INFO)

    # create a file handler
    handler = logging.FileHandler(logPath)
    handler.setLevel(logging.INFO)

    # create a logging format
    formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
    handler.setFormatter(formatter)

    # add the handlers to the logger
    logger.addHandler(handler)

    ssc = StreamingContext.getOrCreate('/home/<username>/tmp/checkpoint_v0', lambda: createContext(loggerName))
    ssc.start()
    ssc.awaitTermination()

#end main if

SO where is the "counts_this_batch" value written to? where us scLogger writing to?

 

YARN is set to aggregate the logs. I have looked in the folders set in the YARN Log DIR property.

 

I have also searched for it in

 

yarn logs --applicationId application_12323123213

 

Any leads hightly appreciated.

Cloudera Employee
Posts: 59
Registered: ‎11-16-2015

Re: Where are the logfiles for Spark2 Executors?

I have not got a chance to run your code locally, but I believe it should be where you've defined your logpath to be

 

logPath='/home/<username>/logs/' + os.path.basename(__file__) + '-' + dtmStamp + '.log'

 

Did you forgot to replace the <username> with the actual username or was it redacted for sharing purpose?

 

New Contributor
Posts: 5
Registered: ‎04-12-2018

Re: Where are the logfiles for Spark2 Executors?

Ah, it's not as simple as it appears. I added <username> explicitly to obfuscate my name. Spark executes the code on several nodes and I want to know where the log messages are written to. It's definetly not in the log file I specified.

 

Hope that helps clarify.

Cloudera Employee
Posts: 59
Registered: ‎11-16-2015

Re: Where are the logfiles for Spark2 Executors?

I see, Thanks. Are you able to print the results on the console using a simple spark kafka streaming app https://www.cloudera.com/documentation/enterprise/5-8-x/topics/spark_streaming.html ? If yes, we'd need to look at why the logging part is not working.

Highlighted
New Contributor
Posts: 5
Registered: ‎04-12-2018

Re: Where are the logfiles for Spark2 Executors?

Yes pprint works fine. But I want to log the messages to a log file.


@AutoIN wrote:

I see, Thanks. Are you able to print the results on the console using a simple spark kafka streaming app https://www.cloudera.com/documentation/enterprise/5-8-x/topics/spark_streaming.html ? If yes, we'd need to look at why the logging part is not working.


 

Announcements