About RangaReddy

RangaReddy · ‎03-30-2023

Use the following tool to generate no of executors: https://rangareddy.github.io/SparkConfigurationGenerator/ In order to calculate the driver memory/executor memory we need to start with 1g, 2g, 4g, 8g .... and executor-cores you can set 3-5 and number of executor it will depend on data how much you are processing.

RangaReddy · ‎03-30-2023

Hi @pankshiv1809 To run application faster, we need to tune the resources, spark code and cluster. To solve any kind of performance issues, you need to go through the Spark UI and understand jobs, stages and executors. After that you need to tune the resources like driver and executor memory and no of executors and separate queue to process the data. If you want to know other techniques raise the Cloudera case, we will help to further.

RangaReddy · ‎03-30-2023

Hi @quangbilly79 Cloudera will support YARN and Kubernets deployment mode and it will not support Standalone mode (In standalone mode you can access the Spark Master using 7077 port). In order to check which node driver is launched and which node is executor is launched you need to go to Spark UI or Spark History Server UI of that application. From there go to Executors tab. You can see list of executors. In the second table you find executor id. Where the executor id is 'driver' that is the one Driver Node and remaining all are executors.

RangaReddy · ‎03-30-2023

RangaReddy · ‎03-30-2023

Hi @Albap Based on the logs, i can see you have created streaming application. By default streaming application will run 24*7, it will stop only when we kill or some interrupted event happen at the system level. Better way to kill/shutdown the spark streaming applications is by using graceful shutdown. If you need further help, please raise an cloudera case we will work on.

RangaReddy · ‎02-27-2023

Spark Rolling event log files 1. Introduction While running a long-running spark application (for example streaming application), the spark will generate a larger/huge single event log file until the Spark application is killed or stopped. Maintaining a single event log file which may cost a lot to maintain and also requires a bunch of resources to replay per each update in the Spark History Server. To avoid creating. a single huge event log file, the spark team created a rolling event log file. 2. Enabling the Spark Rolling Event logs in CDP Step1: Enable the rolling event logs and set the max file size CM -->Spark 3 --> Configuration --> Spark 3 Client Advanced Configuration Snippet (Safety Valve) for spark3-conf/spark-defaults.conf. spark.eventLog.rolling.enabled=true spark.eventLog.rolling.maxFileSize=128m The default spark.eventLog.rolling.maxFileSize value will be 128MB. The minimum value is 10MB. Step2: Set the rolling event log max files to retain CM -->Spark 3 --> Configuration --> History Server Advanced Configuration Snippet (Safety Valve) for spark3-conf/spark-history-server.conf spark.history.fs.eventLog.rolling.maxFilesToRetain=2 By default, spark.history.fs.eventLog.rolling.maxFilesToRetain value will be infinity meaning all event log files are retained. The minimum value is 1. 3. Verify the output Verify the output from the Spark history server event log directory. [root@c3543-node4 ~]# sudo -u spark hdfs dfs -ls -R /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002 -rw-rw---- 3 spark spark 0 2023-01-04 07:03 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/appstatus_application_1672813574470_0002.inprogress -rw-rw---- 3 spark spark 10485458 2023-01-04 07:05 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/events_1_application_1672813574470_0002 -rw-rw---- 3 spark spark 0 2023-01-04 07:05 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/events_2_application_1672813574470_0002 [root@c3543-node4 ~]# sudo -u spark hdfs dfs -ls -R /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002 -rw-rw---- 3 spark spark 0 2023-01-04 07:03 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/appstatus_application_1672813574470_0002.inprogress -rw-rw---- 3 spark spark 492014 2023-01-04 07:06 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/events_1_application_1672813574470_0002.compact -rw-rw---- 3 spark spark 10489509 2023-01-04 07:06 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/events_2_application_1672813574470_0002 -rw-rw---- 3 spark spark 227068 2023-01-04 07:06 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/events_3_application_1672813574470_0002 [root@c3543-node4 ~]# sudo -u spark hdfs dfs -ls -R /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002 -rw-rw---- 3 spark spark 0 2023-01-04 07:03 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/appstatus_application_1672813574470_0002.inprogress -rw-rw---- 3 spark spark 873356 2023-01-04 07:06 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/events_2_application_1672813574470_0002.compact -rw-rw---- 3 spark spark 10484816 2023-01-04 07:06 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/events_3_application_1672813574470_0002 -rw-rw---- 3 spark spark 339165 2023-01-04 07:06 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/events_4_application_1672813574470_0002 References: SPARK-28594 Applying compaction on rolling event log files

vi1 · ‎02-03-2023

After following above steps I'm still not able to start hiveserver2

RangaReddy · ‎01-18-2023

Hi @Nikhil44 First of all, Cloudera will not support Standalone Spark installation. To access any hive table, we need a hive-site.xml and Hadoop-related configuration files like (core-site.xml, hdfs-site.xml and yarn-site.xml)

VidyaSargur · ‎01-02-2023

@Samie, Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.

RangaReddy · ‎12-08-2022

Hi @quangbilly79 You have used CDP hbase-spark-1.0.0.7.2.15.0-147.jar instead of CDH. There is no guarantee it will work latest jar in CDH. Luckily for you it is worked.

Online	Offline
Last Visited	‎08-29-2024 03:41 AM

Member Since	‎06-02-2020 05:25 AM
Last Visited	‎08-29-2024 03:41 AM
Posts	331
Kudos received	68

Cloudera Community

Re: Icebreg on CDP private cloud 7.1.9

Re: How to set default time zone/local time for Sp...

Re: Load Iceberg Table on PowerBI Desktop

Re: NoClassDefFoundError due to Incompatible Spark...

Re: Creating Iceberg table

Re: How to tune spark job on (execution time wise ...

Re: Apache Spark Job Is slow and wanted to Make It...

Re: How to know which Node is Driver Node, which N...

Re: Reading CSV File Spark - Issue with Backslash

Re: Multiple spark Spark jobs failed

How to Spark Roll Event Log Files in CDP

Re: hive server restarted failed with error stoppi...

Re: Read hive table from spark

Re: All spark-submit are routed to the same yarn q...

Re: Problems using hbase-spark on CDH