I have a very simple Spark Stream processing application and can't figure out where the log messages are. Here is the code
def createContext(loggerName): sc = SparkContext(appName="Compute Streaming Stats") sc.setLogLevel("INFO") log4jLogger = sc._jvm.org.apache.log4j scLogger = log4jLogger.LogManager.getLogger(loggerName) # stream for 180 seconds ssc = StreamingContext(sc, 180) kafkaStream = KafkaUtils.createStream(ssc, \ "my.server:2181", \ "compute-streaming-stats",\ {"test":1}) parsed = kafkaStream.map(lambda v: json.loads(v[1])) # Count number of tweets in the batch count_this_batch = kafkaStream.count() scLogger.info(count_this_batch) scLogger.info(parsed) return ssc #end createContext if __name__ == "__main__": ## Log setup loggerName = __name__ dtmStamp = datetime.now().strftime('%Y_%m_%d_%H_%M_%S') logPath='/home/<username>/logs/' + os.path.basename(__file__) + '-' + dtmStamp + '.log' logger = logging.getLogger(loggerName) logger.setLevel(logging.INFO) # create a file handler handler = logging.FileHandler(logPath) handler.setLevel(logging.INFO) # create a logging format formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s') handler.setFormatter(formatter) # add the handlers to the logger logger.addHandler(handler) ssc = StreamingContext.getOrCreate('/home/<username>/tmp/checkpoint_v0', lambda: createContext(loggerName)) ssc.start() ssc.awaitTermination() #end main if
SO where is the "counts_this_batch" value written to? where us scLogger writing to?
YARN is set to aggregate the logs. I have looked in the folders set in the YARN Log DIR property.
I have also searched for it in
yarn logs --applicationId application_12323123213
Any leads hightly appreciated.
Created 05-11-2018 08:23 AM
I have not got a chance to run your code locally, but I believe it should be where you've defined your logpath to be
logPath='/home/<username>/logs/' + os.path.basename(__file__) + '-' + dtmStamp + '.log'
Did you forgot to replace the <username> with the actual username or was it redacted for sharing purpose?
Created 05-11-2018 08:29 AM
Ah, it's not as simple as it appears. I added <username> explicitly to obfuscate my name. Spark executes the code on several nodes and I want to know where the log messages are written to. It's definetly not in the log file I specified.
Hope that helps clarify.
Created 05-11-2018 08:35 AM
I see, Thanks. Are you able to print the results on the console using a simple spark kafka streaming app https://www.cloudera.com/documentation/enterprise/5-8-x/topics/spark_streaming.html ? If yes, we'd need to look at why the logging part is not working.
Created 05-11-2018 08:43 AM
Yes pprint works fine. But I want to log the messages to a log file.
@AutoIN wrote:I see, Thanks. Are you able to print the results on the console using a simple spark kafka streaming app https://www.cloudera.com/documentation/enterprise/5-8-x/topics/spark_streaming.html ? If yes, we'd need to look at why the logging part is not working.