Support Questions

aj · ‎03-27-2017

I would like to use Pythons Logging library, but want the output of the logs to land in HDFS instead of the local file system for the worker node. Is there a way to do that?

My code for setting up logging is below:

import logging
logging.basicConfig(filename='/var/log/DataFramedriversRddConvert.log',level=logging.DEBUG)
logging.basicConfig(format='%(asctime)s %(message)s')
logging.info('++++Started DataFramedriversRddConvert++++')

saranvisa · ‎03-27-2017

@aj

You can achive this by giving fully qualified path.

## To use HDFS path

hdfs://<cluster-node>:8020/user/<path>

## To use Local path
file:///home/<path>

Some additional Notes: It is not recommended to have logs in HDFS for two reasons

1. HDFS maintains 3 replication factors by default.

2. If HDFS goes down, you cannot check the logs

View solution in original post

saranvisa · ‎03-27-2017

@aj

You can achive this by giving fully qualified path.

## To use HDFS path

hdfs://<cluster-node>:8020/user/<path>

## To use Local path
file:///home/<path>

Some additional Notes: It is not recommended to have logs in HDFS for two reasons

1. HDFS maintains 3 replication factors by default.

2. If HDFS goes down, you cannot check the logs

KGF · ‎09-08-2020

This is not working. Please let me know how to use full path

Cloudera Community

Support Questions

PySpark Logging to HDFS instead of local filesystem

pyspark toPandas() works locally but fails in clus...

PySpark helper function to parse Apache Logs

Auth-to-local Rules Syntax

Spark + S3A filesystem client from HDP to access S...

HDFS + create simbolic link between HDFS folder to...

Pyspark Streaming Wordcount Example

Distributed XGBoost with PySpark in Cloudera Machi...

parquet-tools :: No FileSystem for scheme hdfs

Loading data into HDFS from local filesystem, Flum...

Using HDFS as local storage for yarn cluster drive...