Archives of Support Questions (Read Only)

aj · ‎03-27-2017

I would like to use Pythons Logging library, but want the output of the logs to land in HDFS instead of the local file system for the worker node. Is there a way to do that?

My code for setting up logging is below:

import logging
logging.basicConfig(filename='/var/log/DataFramedriversRddConvert.log',level=logging.DEBUG)
logging.basicConfig(format='%(asctime)s %(message)s')
logging.info('++++Started DataFramedriversRddConvert++++')

saranvisa · ‎03-27-2017

@aj

You can achive this by giving fully qualified path.

## To use HDFS path

hdfs://<cluster-node>:8020/user/<path>

## To use Local path
file:///home/<path>

Some additional Notes: It is not recommended to have logs in HDFS for two reasons

1. HDFS maintains 3 replication factors by default.

2. If HDFS goes down, you cannot check the logs

View solution in original post

saranvisa · ‎03-27-2017

@aj