About RangaReddy

RangaReddy · ‎03-30-2023

Hi @Albap Based on the logs, i can see you have created streaming application. By default streaming application will run 24*7, it will stop only when we kill or some interrupted event happen at the system level. Better way to kill/shutdown the spark streaming applications is by using graceful shutdown. If you need further help, please raise an cloudera case we will work on.

RangaReddy · ‎03-30-2023

Hi @pankshiv1809 Based on the above exception, we can see while calling saveAsTable() it is occurred but it is not showing what is the cause. For that one you need to check the application logs properly. If still not able to resolve the issue better raise an cloudera case we will work on the fix.

RangaReddy · ‎03-30-2023

Hi @dmharshit With the below log, difficult to provide a solution because it don't have full logs. Please create a cloudera case to check the logs and provide a solution. ERROR : FAILED: Execution Error, return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create Spark client for Spark session c-47f2-aceb-22390502b303 Error: Error while compiling statement: FAILED: Execution Error, return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create Spark client for Spark session d6d96da5-f2bc-47f2-aceb-22390502b303 (state=42000,code=30041)

RangaReddy · ‎03-30-2023

Hi @pankshiv1809 You application is failed with java.lang.OutOfMemoryError: GC overhead limit exceeded. Based on the data you are processing you need to adjust the resources (Executor memory and Driver memory and overhead). For example, --conf spark.executor.memory=10g --conf spark.driver.memoryOverhead=1g --conf spark.driver.memory=10g --conf spark.executor.memoryOverhead=1g

RangaReddy · ‎03-30-2023

Hi @gizelly Based on the above exception, spark-sql-kafka library is not added to the classpath. You need to add that jar to class path or build the fat jar. For your spark-submit command add the jar using following command: ./bin/spark-submit \ --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.3.1 ...

RangaReddy · ‎03-30-2023

Hi @Theoo It is not easy to upgrade for every immediate release. Internally we need to test it for multiple components. Please check always cloudera docs for latest releases and its supported versions.

RangaReddy · ‎03-30-2023

Hi @ComNic Please accept above solution if you got answer for your question.

RangaReddy · ‎03-30-2023

Hi @BrianChan If your cluster is enabled HDFS HA cluster then you will get the namespace from hdfs-site.xml file. If your cluster is not enabled HDFS HA then simply you can specify like below spark.eventLog.dir=/user/spark/applicationHistory

RangaReddy · ‎03-28-2023

Hi @ComNic When you run the spark application in yarn mode, Spark application will launch executors/containers in different node. In each node, logs are stored. When log aggregation is happen all logs are merged and we can access the logs by running following command: yarn logs -applicationId <Application_ID> The container logs should be under yarn.nodemanager.log-dirs path.

RangaReddy · ‎02-27-2023

Spark Rolling event log files 1. Introduction While running a long-running spark application (for example streaming application), the spark will generate a larger/huge single event log file until the Spark application is killed or stopped. Maintaining a single event log file which may cost a lot to maintain and also requires a bunch of resources to replay per each update in the Spark History Server. To avoid creating. a single huge event log file, the spark team created a rolling event log file. 2. Enabling the Spark Rolling Event logs in CDP Step1: Enable the rolling event logs and set the max file size CM -->Spark 3 --> Configuration --> Spark 3 Client Advanced Configuration Snippet (Safety Valve) for spark3-conf/spark-defaults.conf. spark.eventLog.rolling.enabled=true spark.eventLog.rolling.maxFileSize=128m The default spark.eventLog.rolling.maxFileSize value will be 128MB. The minimum value is 10MB. Step2: Set the rolling event log max files to retain CM -->Spark 3 --> Configuration --> History Server Advanced Configuration Snippet (Safety Valve) for spark3-conf/spark-history-server.conf spark.history.fs.eventLog.rolling.maxFilesToRetain=2 By default, spark.history.fs.eventLog.rolling.maxFilesToRetain value will be infinity meaning all event log files are retained. The minimum value is 1. 3. Verify the output Verify the output from the Spark history server event log directory. [root@c3543-node4 ~]# sudo -u spark hdfs dfs -ls -R /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002 -rw-rw---- 3 spark spark 0 2023-01-04 07:03 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/appstatus_application_1672813574470_0002.inprogress -rw-rw---- 3 spark spark 10485458 2023-01-04 07:05 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/events_1_application_1672813574470_0002 -rw-rw---- 3 spark spark 0 2023-01-04 07:05 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/events_2_application_1672813574470_0002 [root@c3543-node4 ~]# sudo -u spark hdfs dfs -ls -R /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002 -rw-rw---- 3 spark spark 0 2023-01-04 07:03 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/appstatus_application_1672813574470_0002.inprogress -rw-rw---- 3 spark spark 492014 2023-01-04 07:06 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/events_1_application_1672813574470_0002.compact -rw-rw---- 3 spark spark 10489509 2023-01-04 07:06 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/events_2_application_1672813574470_0002 -rw-rw---- 3 spark spark 227068 2023-01-04 07:06 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/events_3_application_1672813574470_0002 [root@c3543-node4 ~]# sudo -u spark hdfs dfs -ls -R /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002 -rw-rw---- 3 spark spark 0 2023-01-04 07:03 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/appstatus_application_1672813574470_0002.inprogress -rw-rw---- 3 spark spark 873356 2023-01-04 07:06 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/events_2_application_1672813574470_0002.compact -rw-rw---- 3 spark spark 10484816 2023-01-04 07:06 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/events_3_application_1672813574470_0002 -rw-rw---- 3 spark spark 339165 2023-01-04 07:06 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/events_4_application_1672813574470_0002 References: SPARK-28594 Applying compaction on rolling event log files

Online	Offline
Last Visited	‎08-29-2024 03:41 AM

Member Since	‎06-02-2020 05:25 AM
Last Visited	‎08-29-2024 03:41 AM
Posts	331
Kudos received	68

Cloudera Community

Re: Icebreg on CDP private cloud 7.1.9

Re: How to set default time zone/local time for Sp...

Re: Load Iceberg Table on PowerBI Desktop

Re: NoClassDefFoundError due to Incompatible Spark...

Re: Creating Iceberg table

Re: Multiple spark Spark jobs failed

Re: Hadoop job got failed and getting below error ...

Re: Error "Failed to create Spark client for Spark...

Re: Hadoop-Spark Job execution Issue

Re: Error: Cannot load main class from JAR ...... ...

Re: CDP Private cloud support for java 17 and spar...

Re: where is spark application logs ? & default p...

Re: Fail to start pyspark session

Re: where is spark application logs ? & default p...

How to Spark Roll Event Log Files in CDP