About RangaReddy

RangaReddy · ‎08-30-2022

Hi @mala_etl I think you didn't mention you are running the application in CDH/HDP/CDP. Could you please share your hive script and check you are using hive catalog instead of in-memory catalog.

RangaReddy · ‎08-30-2022

Hi @somant Please don't use open source libraries and use cluster-supported spark/kafka versions. Check the following example code: https://community.cloudera.com/t5/Community-Articles/Running-DirectKafkaWordCount-example-in-CDP/ta-p/340402

RangaReddy · ‎08-30-2022

Hi @MikeCC Spark 3.3 is not yet supported in CDP. We have a future plan to release Spark 3.3 in CDP 7.1.8 or later versions. As per the below support matrix, we are not yet supported Java17. https://supportmatrix.cloudera.com/ I hope you have got answers for your question. If yes, please accept as a solution.

RangaReddy · ‎08-08-2022

Hi @Suhas_Ganorkar We are not supporting any CDH 5.x issues. As mentioned in error it is reached EOS. Please specify different CDH version.

RangaReddy · ‎08-03-2022

Hi @ssuja In the above screenshot, we can see clearly, java path location is not able to find. Please verify and update the java path in nodes properly.

RangaReddy · ‎08-03-2022

Hi @ssuja Please try to run the SparkPi example and download the application logs. Once you are downloaded check is there any exceptions in the logs. spark-submit \ --class org.apache.spark.examples.SparkPi \ --master yarn \ --deploy-mode cluster \ --num-executors 1 \ --driver-memory 512m \ --executor-memory 512m \ --executor-cores 1 \ /usr/hdp/current/spark2-client/examples/jars/spark-examples_*.jar 10

RangaReddy · ‎07-26-2022

Hi Team, CDP uses the "org.apache.spark.internal.io.cloud.PathOutputCommitProtocol" OutputCommitter which does not support dynamicPartitionOverwrite. You can set the following parameters into your spark job. code level: spark.conf.set("spark.sql.sources.partitionOverwriteMode", "dynamic") spark.conf.set("spark.sql.parquet.output.committer.class", "org.apache.parquet.hadoop.ParquetOutputCommitter") spark.conf.set("spark.sql.sources.commitProtocolClass", "org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol") spark-submit/spark-shell: --conf spark.sql.sources.partitionOverwriteMode=dynamic --conf spark.sql.parquet.output.committer.class=org.apache.parquet.hadoop.ParquetOutputCommitter --conf spark.sql.sources.commitProtocolClass=org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol Note: If you are using S3, you can disable it by specifying spark.cloudera.s3_committers.enabled parameter. --conf spark.cloduera.s3_committers.enabled=false

RangaReddy · ‎04-29-2022

Hi @JoeR Spark will support reading files with multiple file formats like parquet, orc, json, xml, avro,csv etc. I think there is no direct mechanism to read the data from the payload. If I found a different solution, I will share it with you.

RangaReddy · ‎04-05-2022

In this post, we will learn how to create a Kafka topic and produce and consume messages from a Kafka topic. After testing the basic producer and consumer example, we will test it with Spark using spark-examples.jar file. Creating a Kafka topic: # kafka bootstrap server KAFKA_BROKERS="localhost:9092" # kafka topic name TOPIC_NAME="word_count_topic" # group name GROUP_NAME="spark-kafka-group" # creating a topic /opt/cloudera/parcels/CDH/lib/kafka/bin/kafka-topics.sh --create --topic ${TOPIC_NAME} --bootstrap-server ${KAFKA_BROKERS} # describing a topic /opt/cloudera/parcels/CDH/lib/kafka/bin/kafka-topics.sh --describe --topic ${TOPIC_NAME} --bootstrap-server ${KAFKA_BROKERS} Producing messages to Kafka topic: # producing kafka messages /opt/cloudera/parcels/CDH/lib/kafka/bin/kafka-console-producer.sh --topic ${TOPIC_NAME} --broker-list ${KAFKA_BROKERS} Consuming messages from Kafka topic: # consuming kafka messages /opt/cloudera/parcels/CDH/lib/kafka/bin/kafka-console-consumer.sh --bootstrap-server ${KAFKA_BROKERS} --group ${GROUP_NAME} --topic ${TOPIC_NAME} --from-beginning Submitting the Spark KafkaWordCount example: spark-submit \ --master yarn \ --deploy-mode client \ --packages org.apache.spark:spark-streaming-kafka-0-10_2.11:2.4.7.7.1.7.0-551 \ --repositories https://repository.cloudera.com/artifactory/cloudera-repos/ \ --class org.apache.spark.examples.streaming.DirectKafkaWordCount \ /opt/cloudera/parcels/CDH/lib/spark/examples/jars/spark-examples_*.jar ${KAFKA_BROKERS} ${GROUP_NAME} ${TOPIC_NAME}

RangaReddy · ‎02-22-2022

Hi @Rajeshhadoop I think it is the not right way to ask set of questions in single community article. Please create a new thread for any kind of questions.

Online	Offline
Last Visited	‎08-29-2024 03:41 AM

Member Since	‎06-02-2020 05:25 AM
Last Visited	‎08-29-2024 03:41 AM
Posts	331
Kudos received	68

Cloudera Community

Re: Icebreg on CDP private cloud 7.1.9

Re: How to set default time zone/local time for Sp...

Re: Load Iceberg Table on PowerBI Desktop

Re: NoClassDefFoundError due to Incompatible Spark...

Re: Creating Iceberg table

Re: Spark cannot read hive orc table

Re: Spark Streaming job not reading data from Kafk...

Re: CDP Private cloud support for java 17 and spar...

Re: Unable to create new Cloudera case

Re: ERROR SparkContext: Error initializing SparkCo...

Re: ERROR SparkContext: Error initializing SparkCo...

Re: spark.sql.sources.partitionOverwriteMode=dynam...

Re: Parquet as Array[Byte] to DataFrame without wr...

Running DirectKafkaWordCount example in CDP

Re: upgrading spark to 2.4