Member since
06-02-2020
331
Posts
67
Kudos Received
49
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4097 | 07-11-2024 01:55 AM | |
| 11362 | 07-09-2024 11:18 PM | |
| 8558 | 07-09-2024 04:26 AM | |
| 8574 | 07-09-2024 03:38 AM | |
| 7503 | 06-05-2024 02:03 AM |
08-31-2022
09:33 PM
Hi @AZIMKBC Please try to run the SparkPi example and see if is there any error in the logs. https://rangareddy.github.io/SparkPiExample/ If still issue is not resolved and you are a Cloudera customer please raise a case we will work on internally.
... View more
08-31-2022
09:29 PM
Hi @shraddha Could you please check by any chance if you have set master as local while creating SparkSession in your code. Use the following sample code to run locally and cluster without updating the master value. val appName = "MySparkApp"
// Creating the SparkConf object
val sparkConf = new SparkConf().setAppName(appName).setIfMissing("spark.master", "local[2]")
// Creating the SparkSession object
val spark: SparkSession = SparkSession.builder().config(sparkConf).getOrCreate() Verify the whole logs once again to check is there any others errors.
... View more
08-31-2022
09:20 PM
Hi @Yosieam Thanks for sharing the code. You forgot to share the spark-submit/pyspark command. Please check what is executor/driver memory is passed to the spark-submit. Could you please confirm file is in local system/hdfs system.
... View more
08-31-2022
09:15 PM
Hi @nvelraj Pyspark job working locally because in your local system pandas library is installed, so it is working. When you run in cluster, pandas library/module is not available so you are getting the following error. ModuleNotFoundError: No module named 'pandas' To solve the. issue, you need to install the pandal library/module in all machines or use Virtual environment.
... View more
08-31-2022
08:59 PM
Hi @Camilo When you are sharing the exception you need to share more details. So it will help us to provide a solution in faster way. 1. How are you launching the spark job? 2. If you built application using maven or sbt built tool have you specified spark-hive.jar version. For example, <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.12</artifactId>
<version>2.4.8</version>
<scope>provided</scope>
</dependency> References: 1. https://stackoverflow.com/questions/39444493/how-to-create-sparksession-with-hive-support-fails-with-hive-classes-are-not-f 2. https://mvnrepository.com/artifact/org.apache.spark/spark-hive
... View more
08-30-2022
11:18 PM
What is the HDP version. if it is HDP3.x then you need to use Hive Warehouse Connector (HWC).
... View more
08-30-2022
04:31 AM
Hi @somant Please don't use open source libraries and use cluster-supported spark/kafka versions. Check the following example code: https://community.cloudera.com/t5/Community-Articles/Running-DirectKafkaWordCount-example-in-CDP/ta-p/340402
... View more
08-08-2022
03:42 AM
@Suhas_Ganorkar, we have reached out to you via PM with more details.
... View more
08-08-2022
02:39 AM
@ssuja, Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.
... View more
07-26-2022
04:17 AM
1 Kudo
Hi Team, CDP uses the "org.apache.spark.internal.io.cloud.PathOutputCommitProtocol" OutputCommitter which does not support dynamicPartitionOverwrite. You can set the following parameters into your spark job. code level: spark.conf.set("spark.sql.sources.partitionOverwriteMode", "dynamic")
spark.conf.set("spark.sql.parquet.output.committer.class", "org.apache.parquet.hadoop.ParquetOutputCommitter")
spark.conf.set("spark.sql.sources.commitProtocolClass", "org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol") spark-submit/spark-shell: --conf spark.sql.sources.partitionOverwriteMode=dynamic --conf spark.sql.parquet.output.committer.class=org.apache.parquet.hadoop.ParquetOutputCommitter --conf spark.sql.sources.commitProtocolClass=org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol Note: If you are using S3, you can disable it by specifying spark.cloudera.s3_committers.enabled parameter. --conf spark.cloduera.s3_committers.enabled=false
... View more