About RangaReddy

RangaReddy · ‎10-17-2023

Apache Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Java, Scala, Python, and R, and an optimized engine that supports general execution graphs. The below table provides you links to features released in each Spark 3 minor version: Spark Version Documentation Link Release Date Spark 3.0.0 https://spark.apache.org/docs/3.0.0/ 2020-06-16 Spark 3.0.1 https://spark.apache.org/docs/3.0.1/ 2020-11-05 Spark 3.0.2 https://spark.apache.org/docs/3.0.2/ 2021-02-19 Spark 3.0.3 https://spark.apache.org/docs/3.0.3/ 2022-06-17 Spark 3.1.1 https://spark.apache.org/docs/3.1.1/ 2021-03-02 Spark 3.1.2 https://spark.apache.org/docs/3.1.2/ 2022-06-17 Spark 3.1.3 https://spark.apache.org/docs/3.1.3/ 2022-06-17 Spark 3.2.0 https://spark.apache.org/docs/3.2.0/ 2021-10-13 Spark 3.2.1 https://spark.apache.org/docs/3.2.1/ 2022-06-17 Spark 3.2.2 https://spark.apache.org/docs/3.2.2/ 2022-07-15 Spark 3.2.3 https://spark.apache.org/docs/3.2.3/ 2022-11-28 Spark 3.2.4 https://spark.apache.org/docs/3.2.4/ 2023-04-13 Spark 3.3.0 https://spark.apache.org/docs/3.3.0/ 2022-06-17 Spark 3.3.1 https://spark.apache.org/docs/3.3.1/ 2022-10-25 Spark 3.3.2 https://spark.apache.org/docs/3.3.2/ 2023-02-15 Spark 3.3.3 https://spark.apache.org/docs/3.3.3/ 2023-08-21 Spark 3.4.0 https://spark.apache.org/docs/3.4.0/ 2023-04-13 Spark 3.4.1 https://spark.apache.org/docs/3.4.1/ 2023-06-23 Spark 3.5.0 https://spark.apache.org/docs/3.5.0/ 2023-09-13

RangaReddy · ‎10-05-2023

When we submit the spark using YARN, based on YARN resources application will run. In your case you need to add more YARN Gateway nodes to process with more resources. We can't process the data by only added new nodes and yarn will distribute processing all nodes.

RangaReddy · ‎10-05-2023

Hi @hegdemahendra CDP onwards Spark Thrift Server is not supported. You can try the following links maybe it is useful for you: https://stackoverflow.com/questions/29227949/how-to-implement-spark-sql-pagination-query

RangaReddy · ‎10-05-2023

Hi Team, Livy3 with Zeppelin Integration is not yet supported. To use Spark3, you need to install python3 and needs to add the following parameters: PYSPARK3_PYTHON spark.yarn.appMasterEnv.PYSPARK3_PYTHON Reference: https://docs.cloudera.com/cdp-private-cloud-base/7.1.8/running-spark-applications/topics/spark-python-path-variables-livy.html

RangaReddy · ‎10-02-2023

Hi @pranav007 The required setup is little bit complex. You can try copy the core-site.xml, hdfs-site.xml, yarn-site.xml, hive-site.xml, mapred-site.xml, krb_configuration files to resource folder. In the spark code, you need to add a two parameters i.e spark.driver.extraJavaOptions and spark.executor.extraJavaOptionsby specifiing the krb_file location. --conf spark.driver.extraJavaOptions="-Djava.security.krb5.conf=KRB5_PATH" \ --conf spark.executor.extraJavaOptions="-Djava.security.krb5.conf=KRB5_PATH" \

RangaReddy · ‎10-02-2023

There are two solutions you can try. 1. Create one more shell operator and perform kinit and after that submit your spark 2. Pass the keytab and principal to the spark-submit

RangaReddy · ‎09-14-2023

Hi @Emanuel_MXN Generally not recommended to keep event logs older than some days/months. In your case, you are keeping logs for years. To avoid keeping the old logs, please add the following parameters to the spark-defaults.conf file and delete old event logs based on your need. spark.history.fs.cleaner.enabled true spark.history.fs.cleaner.maxAge 7d spark.history.fs.cleaner.interval 1h I don't have any handy script to delete files from hdfs for specific date/year. If i found definetely i will share it here.

RangaReddy · ‎09-03-2023

Hi @imule You can follow the following steps to generate the keytab and if you don't have permission, please check with your admin team. https://docs.cloudera.com/data-hub/cloud/access-clusters/topics/dh-retrieving-keytabs.html

RangaReddy · ‎08-31-2023

Hi @imule In step3, could you please pass --keytab <key_tab_path> --principal <principal_name> to the spark-submit command. Note: In CDP, Airflow integration is not yet we are supported.

RangaReddy · ‎08-20-2023

Hi @Rohan44 Could you please test the above application just by specifying keytab and principal and removing the other security related parameters in spark-submit?

Online	Offline
Last Visited	‎08-29-2024 03:41 AM

Member Since	‎06-02-2020 05:25 AM
Last Visited	‎08-29-2024 03:41 AM
Posts	331
Kudos received	65

Cloudera Community

Re: Icebreg on CDP private cloud 7.1.9

Re: How to set default time zone/local time for Sp...

Re: Load Iceberg Table on PowerBI Desktop

Re: NoClassDefFoundError due to Incompatible Spark...

Re: Creating Iceberg table

Feature Releases of Apache Spark 3 minor versions

Re: How to make Yarn deploy resources to new added...

Re: Spark Thrift server JDBC sql with pagination

Re: livy-on-spark3 interpreter died

Re: Connecting local spark with Hive with kerberos

Re: Issue running spark jobs with Airflow

Re: Delete application logs from spark history spe...

Re: Issue running spark jobs with Airflow

Re: Issue running spark jobs with Airflow

Re: Spark-Hive Application: SASL Negotiation Failu...