About RangaReddy

RangaReddy · ‎08-31-2022

Hi @AZIMKBC Please try to run the SparkPi example and see if is there any error in the logs. https://rangareddy.github.io/SparkPiExample/ If still issue is not resolved and you are a Cloudera customer please raise a case we will work on internally.

RangaReddy · ‎08-31-2022

Hi @shraddha Could you please check by any chance if you have set master as local while creating SparkSession in your code. Use the following sample code to run locally and cluster without updating the master value. val appName = "MySparkApp" // Creating the SparkConf object val sparkConf = new SparkConf().setAppName(appName).setIfMissing("spark.master", "local[2]") // Creating the SparkSession object val spark: SparkSession = SparkSession.builder().config(sparkConf).getOrCreate() Verify the whole logs once again to check is there any others errors.

RangaReddy · ‎08-31-2022

Hi @Yosieam Thanks for sharing the code. You forgot to share the spark-submit/pyspark command. Please check what is executor/driver memory is passed to the spark-submit. Could you please confirm file is in local system/hdfs system.

RangaReddy · ‎08-31-2022

Hi @nvelraj Pyspark job working locally because in your local system pandas library is installed, so it is working. When you run in cluster, pandas library/module is not available so you are getting the following error. ModuleNotFoundError: No module named 'pandas' To solve the. issue, you need to install the pandal library/module in all machines or use Virtual environment.

RangaReddy · ‎08-31-2022

Hi @dmharshit As you know, Cloudera provides the Hybrid data platform, so you can install the CDP product in on-premises and public cloud or both. CDP Private Cloud Base product is supported only for On-Premises cluster. CDP Public Cloud Base product is supported for public cloud like AWS, Azure, GCP. @fzsombor already shared references how you can install CDP Private cloud and how to install Spark3 as well. Please let me know still you need any further information.

RangaReddy · ‎08-31-2022

Hi @Camilo When you are sharing the exception you need to share more details. So it will help us to provide a solution in faster way. 1. How are you launching the spark job? 2. If you built application using maven or sbt built tool have you specified spark-hive.jar version. For example,  <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-hive_2.12</artifactId> <version>2.4.8</version> <scope>provided</scope> </dependency> References: 1. https://stackoverflow.com/questions/39444493/how-to-create-sparksession-with-hive-support-fails-with-hive-classes-are-not-f 2. https://mvnrepository.com/artifact/org.apache.spark/spark-hive

RangaReddy · ‎08-30-2022

What is the HDP version. if it is HDP3.x then you need to use Hive Warehouse Connector (HWC).

RangaReddy · ‎08-30-2022

Hi @somant Please don't use open source libraries and use cluster-supported spark/kafka versions. Check the following example code: https://community.cloudera.com/t5/Community-Articles/Running-DirectKafkaWordCount-example-in-CDP/ta-p/340402

RangaReddy · ‎08-30-2022

Hi @Asim- You can check the following link for spark and dbt integration. https://community.cloudera.com/t5/Innovation-Blog/Running-dbt-core-with-adapters-for-Hive-Spark-and-Impala/ba-p/350384

VidyaSargur · ‎08-08-2022

@Suhas_Ganorkar, we have reached out to you via PM with more details.

Online	Offline
Last Visited	‎08-29-2024 03:41 AM

Member Since	‎06-02-2020 05:25 AM
Last Visited	‎08-29-2024 03:41 AM
Posts	331
Kudos received	68

Cloudera Community

Re: Icebreg on CDP private cloud 7.1.9

Re: How to set default time zone/local time for Sp...

Re: Load Iceberg Table on PowerBI Desktop

Re: NoClassDefFoundError due to Incompatible Spark...

Re: Creating Iceberg table

Re: I submit a Spark task in YARN mode, but the me...

Re: Issue on running spark application in Yarn-clu...

Re: Error spark job input is too large to fit in a...

Re: pyspark toPandas() works locally but fails in ...

Re: Can we install CDP 7.1.7 on premise physical m...

Re: Unable to instantiate SparkSession with Hive s...

Re: Spark cannot read hive orc table

Re: Spark Streaming job not reading data from Kafk...

Re: DBT with Spark HWC

Re: Unable to create new Cloudera case