Created 01-21-2021 12:35 AM
Hi All,
I have a spark application running in YARN cluster mode: spark-submit --master yarn --deploy-mode cluster ...
The Spark logs (driver and executor) are stored on HDFS (/user/spark/driverLogs) and available via Cloudera Web UI (Cloudera Web UI -> Clusters -> YARN -> Applications tab -> click on application ID -> Logs) as soon as Spark application is running.
However, when application is completed/failed the logs removed from HDFS /user/spark/driverLogs folder and thus not available via Cloudera Web UI.
When running application in YARN client mode (spark-submit --master yarn --deploy-mode client...), logs are kept after application is completed.
spark-defaults is attached.
Any ideas about the root cause?
Created on 01-29-2021 08:34 AM - edited 01-29-2021 08:38 AM
The location of Spark application history logs in HDFS is configured under spark.eventLog.dir which is set as /user/spark/applicationHistory. So your required logs can be found under this path.
The /user/spark/driverLogs path contains only the spark driver logs in HDFS when Spark application runs in client mode.
Please refer the below link for more details : https://docs.cloudera.com/documentation/enterprise/6/properties/6.3/topics/cm_props_cdh630_spark.htm...
Created 07-10-2024 09:04 AM