Member since
06-02-2020
331
Posts
67
Kudos Received
49
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4115 | 07-11-2024 01:55 AM | |
| 11435 | 07-09-2024 11:18 PM | |
| 8582 | 07-09-2024 04:26 AM | |
| 8619 | 07-09-2024 03:38 AM | |
| 7518 | 06-05-2024 02:03 AM |
03-30-2023
02:51 AM
Hi @Albap Based on the logs, i can see you have created streaming application. By default streaming application will run 24*7, it will stop only when we kill or some interrupted event happen at the system level. Better way to kill/shutdown the spark streaming applications is by using graceful shutdown. If you need further help, please raise an cloudera case we will work on.
... View more
03-30-2023
02:26 AM
Hi @Theoo It is not easy to upgrade for every immediate release. Internally we need to test it for multiple components. Please check always cloudera docs for latest releases and its supported versions.
... View more
03-30-2023
02:23 AM
1 Kudo
Hi @ComNic Please accept above solution if you got answer for your question.
... View more
03-30-2023
02:22 AM
Hi @BrianChan If your cluster is enabled HDFS HA cluster then you will get the namespace from hdfs-site.xml file. If your cluster is not enabled HDFS HA then simply you can specify like below spark.eventLog.dir=/user/spark/applicationHistory
... View more
03-28-2023
02:22 AM
Hi @ComNic When you run the spark application in yarn mode, Spark application will launch executors/containers in different node. In each node, logs are stored. When log aggregation is happen all logs are merged and we can access the logs by running following command: yarn logs -applicationId <Application_ID> The container logs should be under yarn.nodemanager.log-dirs path.
... View more
02-27-2023
02:03 AM
Spark Rolling event log files
1. Introduction
While running a long-running spark application (for example streaming application), the spark will generate a larger/huge single event log file until the Spark application is killed or stopped. Maintaining a single event log file which may cost a lot to maintain and also requires a bunch of resources to replay per each update in the Spark History Server.
To avoid creating. a single huge event log file, the spark team created a rolling event log file.
2. Enabling the Spark Rolling Event logs in CDP
Step1: Enable the rolling event logs and set the max file size
CM -->Spark 3 --> Configuration --> Spark 3 Client Advanced Configuration Snippet (Safety Valve) for spark3-conf/spark-defaults.conf.
spark.eventLog.rolling.enabled=true
spark.eventLog.rolling.maxFileSize=128m
The default spark.eventLog.rolling.maxFileSize value will be 128MB. The minimum value is 10MB.
Step2: Set the rolling event log max files to retain
CM -->Spark 3 --> Configuration --> History Server Advanced Configuration Snippet (Safety Valve) for spark3-conf/spark-history-server.conf
spark.history.fs.eventLog.rolling.maxFilesToRetain=2
By default, spark.history.fs.eventLog.rolling.maxFilesToRetain value will be infinity meaning all event log files are retained. The minimum value is 1.
3. Verify the output
Verify the output from the Spark history server event log directory.
[root@c3543-node4 ~]# sudo -u spark hdfs dfs -ls -R /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002
-rw-rw---- 3 spark spark 0 2023-01-04 07:03 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/appstatus_application_1672813574470_0002.inprogress
-rw-rw---- 3 spark spark 10485458 2023-01-04 07:05 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/events_1_application_1672813574470_0002
-rw-rw---- 3 spark spark 0 2023-01-04 07:05 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/events_2_application_1672813574470_0002
[root@c3543-node4 ~]# sudo -u spark hdfs dfs -ls -R /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002
-rw-rw---- 3 spark spark 0 2023-01-04 07:03 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/appstatus_application_1672813574470_0002.inprogress
-rw-rw---- 3 spark spark 492014 2023-01-04 07:06 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/events_1_application_1672813574470_0002.compact
-rw-rw---- 3 spark spark 10489509 2023-01-04 07:06 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/events_2_application_1672813574470_0002
-rw-rw---- 3 spark spark 227068 2023-01-04 07:06 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/events_3_application_1672813574470_0002
[root@c3543-node4 ~]# sudo -u spark hdfs dfs -ls -R /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002
-rw-rw---- 3 spark spark 0 2023-01-04 07:03 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/appstatus_application_1672813574470_0002.inprogress
-rw-rw---- 3 spark spark 873356 2023-01-04 07:06 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/events_2_application_1672813574470_0002.compact
-rw-rw---- 3 spark spark 10484816 2023-01-04 07:06 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/events_3_application_1672813574470_0002
-rw-rw---- 3 spark spark 339165 2023-01-04 07:06 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/events_4_application_1672813574470_0002
References:
SPARK-28594
Applying compaction on rolling event log files
... View more
Labels:
02-08-2023
11:00 PM
Hi @sat_046 I don't think we have a specific configuration parameter to handle the task retry attempts with some delay. But we have a parameters to blacklist the node if the task is failed with some no of attempts in the node. References: 1. https://community.cloudera.com/t5/Community-Articles/Configuring-spark-task-maxFailures-amp-spark-blacklist-task/ta-p/335235 2. https://www.waitingforcode.com/apache-spark/failed-tasks-resubmit/read
... View more
01-18-2023
01:07 AM
Hi @Nikhil44 First of all, Cloudera will not support Standalone Spark installation. To access any hive table, we need a hive-site.xml and Hadoop-related configuration files like (core-site.xml, hdfs-site.xml and yarn-site.xml)
... View more
12-20-2022
10:18 PM
Hi @Samie Is there any update on your testing?
... View more
12-15-2022
09:13 PM
HI @Samie Please attach the spark application and event logs to check the queue name. The easiest way to check the spark application is by running spark pi example. spark-submit \
--class org.apache.spark.examples.SparkPi \
--queue <queue_name> \
--master yarn \
--deploy-mode cluster \
--num-executors 1 \
--driver-memory 512m \
--executor-memory 512m \
--executor-cores 1 \
/usr/hdp/current/spark2-client/examples/jars/spark-examples_*.jar 10 Spark on YARN only:
--queue QUEUE_NAME The YARN queue to submit to (Default: "default").
... View more