Member since
06-02-2020
331
Posts
64
Kudos Received
49
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
830 | 07-11-2024 01:55 AM | |
2297 | 07-09-2024 11:18 PM | |
2133 | 07-09-2024 04:26 AM | |
1578 | 07-09-2024 03:38 AM | |
1806 | 06-05-2024 02:03 AM |
10-17-2023
10:27 PM
Apache Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Java, Scala, Python, and R, and an optimized engine that supports general execution graphs.
The below table provides you links to features released in each Spark 3 minor version:
Spark Version
Documentation Link
Release Date
Spark 3.0.0
https://spark.apache.org/docs/3.0.0/
2020-06-16
Spark 3.0.1
https://spark.apache.org/docs/3.0.1/
2020-11-05
Spark 3.0.2
https://spark.apache.org/docs/3.0.2/
2021-02-19
Spark 3.0.3
https://spark.apache.org/docs/3.0.3/
2022-06-17
Spark 3.1.1
https://spark.apache.org/docs/3.1.1/
2021-03-02
Spark 3.1.2
https://spark.apache.org/docs/3.1.2/
2022-06-17
Spark 3.1.3
https://spark.apache.org/docs/3.1.3/
2022-06-17
Spark 3.2.0
https://spark.apache.org/docs/3.2.0/
2021-10-13
Spark 3.2.1
https://spark.apache.org/docs/3.2.1/
2022-06-17
Spark 3.2.2
https://spark.apache.org/docs/3.2.2/
2022-07-15
Spark 3.2.3
https://spark.apache.org/docs/3.2.3/
2022-11-28
Spark 3.2.4
https://spark.apache.org/docs/3.2.4/
2023-04-13
Spark 3.3.0
https://spark.apache.org/docs/3.3.0/
2022-06-17
Spark 3.3.1
https://spark.apache.org/docs/3.3.1/
2022-10-25
Spark 3.3.2
https://spark.apache.org/docs/3.3.2/
2023-02-15
Spark 3.3.3
https://spark.apache.org/docs/3.3.3/
2023-08-21
Spark 3.4.0
https://spark.apache.org/docs/3.4.0/
2023-04-13
Spark 3.4.1
https://spark.apache.org/docs/3.4.1/
2023-06-23
Spark 3.5.0
https://spark.apache.org/docs/3.5.0/
2023-09-13
... View more
Labels:
10-05-2023
02:09 AM
When we submit the spark using YARN, based on YARN resources application will run. In your case you need to add more YARN Gateway nodes to process with more resources. We can't process the data by only added new nodes and yarn will distribute processing all nodes.
... View more
10-05-2023
01:54 AM
Hi @hegdemahendra CDP onwards Spark Thrift Server is not supported. You can try the following links maybe it is useful for you: https://stackoverflow.com/questions/29227949/how-to-implement-spark-sql-pagination-query
... View more
10-05-2023
01:44 AM
Hi Team, Livy3 with Zeppelin Integration is not yet supported. To use Spark3, you need to install python3 and needs to add the following parameters: PYSPARK3_PYTHON spark.yarn.appMasterEnv.PYSPARK3_PYTHON Reference: https://docs.cloudera.com/cdp-private-cloud-base/7.1.8/running-spark-applications/topics/spark-python-path-variables-livy.html
... View more
10-02-2023
11:13 PM
Hi @pranav007 The required setup is little bit complex. You can try copy the core-site.xml, hdfs-site.xml, yarn-site.xml, hive-site.xml, mapred-site.xml, krb_configuration files to resource folder. In the spark code, you need to add a two parameters i.e spark.driver.extraJavaOptions and spark.executor.extraJavaOptionsby specifiing the krb_file location. --conf spark.driver.extraJavaOptions="-Djava.security.krb5.conf=KRB5_PATH" \
--conf spark.executor.extraJavaOptions="-Djava.security.krb5.conf=KRB5_PATH" \
... View more
10-02-2023
10:58 PM
There are two solutions you can try. 1. Create one more shell operator and perform kinit and after that submit your spark 2. Pass the keytab and principal to the spark-submit
... View more
09-14-2023
06:09 PM
Hi @Emanuel_MXN Generally not recommended to keep event logs older than some days/months. In your case, you are keeping logs for years. To avoid keeping the old logs, please add the following parameters to the spark-defaults.conf file and delete old event logs based on your need. spark.history.fs.cleaner.enabled true
spark.history.fs.cleaner.maxAge 7d
spark.history.fs.cleaner.interval 1h I don't have any handy script to delete files from hdfs for specific date/year. If i found definetely i will share it here.
... View more
09-03-2023
09:53 PM
Hi @imule You can follow the following steps to generate the keytab and if you don't have permission, please check with your admin team. https://docs.cloudera.com/data-hub/cloud/access-clusters/topics/dh-retrieving-keytabs.html
... View more
08-31-2023
01:20 AM
Hi @imule In step3, could you please pass --keytab <key_tab_path> --principal <principal_name> to the spark-submit command. Note: In CDP, Airflow integration is not yet we are supported.
... View more
08-20-2023
09:03 AM
Hi @Rohan44 Could you please test the above application just by specifying keytab and principal and removing the other security related parameters in spark-submit?
... View more