About RangaReddy

RangaReddy · ‎08-30-2022

Hi @Asim- You can check the following link for spark and dbt integration. https://community.cloudera.com/t5/Innovation-Blog/Running-dbt-core-with-adapters-for-Hive-Spark-and-Impala/ba-p/350384

RangaReddy · ‎08-08-2022

Hi @Suhas_Ganorkar We are not supporting any CDH 5.x issues. As mentioned in error it is reached EOS. Please specify different CDH version.

RangaReddy · ‎08-03-2022

Hi @ssuja In the above screenshot, we can see clearly, java path location is not able to find. Please verify and update the java path in nodes properly.

RangaReddy · ‎08-03-2022

Hi @ssuja Please try to run the SparkPi example and download the application logs. Once you are downloaded check is there any exceptions in the logs. spark-submit \ --class org.apache.spark.examples.SparkPi \ --master yarn \ --deploy-mode cluster \ --num-executors 1 \ --driver-memory 512m \ --executor-memory 512m \ --executor-cores 1 \ /usr/hdp/current/spark2-client/examples/jars/spark-examples_*.jar 10

RangaReddy · ‎07-26-2022

Hi Team, CDP uses the "org.apache.spark.internal.io.cloud.PathOutputCommitProtocol" OutputCommitter which does not support dynamicPartitionOverwrite. You can set the following parameters into your spark job. code level: spark.conf.set("spark.sql.sources.partitionOverwriteMode", "dynamic") spark.conf.set("spark.sql.parquet.output.committer.class", "org.apache.parquet.hadoop.ParquetOutputCommitter") spark.conf.set("spark.sql.sources.commitProtocolClass", "org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol") spark-submit/spark-shell: --conf spark.sql.sources.partitionOverwriteMode=dynamic --conf spark.sql.parquet.output.committer.class=org.apache.parquet.hadoop.ParquetOutputCommitter --conf spark.sql.sources.commitProtocolClass=org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol Note: If you are using S3, you can disable it by specifying spark.cloudera.s3_committers.enabled parameter. --conf spark.cloduera.s3_committers.enabled=false

RangaReddy · ‎04-29-2022

Hi @JoeR Spark will support reading files with multiple file formats like parquet, orc, json, xml, avro,csv etc. I think there is no direct mechanism to read the data from the payload. If I found a different solution, I will share it with you.

RangaReddy · ‎04-06-2022

Hi @yagoaparecidoti Based on the exception, it looks like Kerberos issue. Due to End of Life (EOL) CDH 5.X and CDH 6.X cluster, we are not able to provide any solutions. You can use CDP cluster to test your scenario. It is supported for both Spark2 and Spark3 as well.

RangaReddy · ‎04-05-2022

In this post, we will learn how to create a Kafka topic and produce and consume messages from a Kafka topic. After testing the basic producer and consumer example, we will test it with Spark using spark-examples.jar file. Creating a Kafka topic: # kafka bootstrap server KAFKA_BROKERS="localhost:9092" # kafka topic name TOPIC_NAME="word_count_topic" # group name GROUP_NAME="spark-kafka-group" # creating a topic /opt/cloudera/parcels/CDH/lib/kafka/bin/kafka-topics.sh --create --topic ${TOPIC_NAME} --bootstrap-server ${KAFKA_BROKERS} # describing a topic /opt/cloudera/parcels/CDH/lib/kafka/bin/kafka-topics.sh --describe --topic ${TOPIC_NAME} --bootstrap-server ${KAFKA_BROKERS} Producing messages to Kafka topic: # producing kafka messages /opt/cloudera/parcels/CDH/lib/kafka/bin/kafka-console-producer.sh --topic ${TOPIC_NAME} --broker-list ${KAFKA_BROKERS} Consuming messages from Kafka topic: # consuming kafka messages /opt/cloudera/parcels/CDH/lib/kafka/bin/kafka-console-consumer.sh --bootstrap-server ${KAFKA_BROKERS} --group ${GROUP_NAME} --topic ${TOPIC_NAME} --from-beginning Submitting the Spark KafkaWordCount example: spark-submit \ --master yarn \ --deploy-mode client \ --packages org.apache.spark:spark-streaming-kafka-0-10_2.11:2.4.7.7.1.7.0-551 \ --repositories https://repository.cloudera.com/artifactory/cloudera-repos/ \ --class org.apache.spark.examples.streaming.DirectKafkaWordCount \ /opt/cloudera/parcels/CDH/lib/spark/examples/jars/spark-examples_*.jar ${KAFKA_BROKERS} ${GROUP_NAME} ${TOPIC_NAME}

RangaReddy · ‎02-22-2022

Hi @Rajeshhadoop I think it is the not right way to ask set of questions in single community article. Please create a new thread for any kind of questions.

RangaReddy · ‎02-21-2022

Please go through the following article. https://community.cloudera.com/t5/Community-Articles/Spark-Memory-Management/ta-p/317794 Unified Memory Manager is introduced in Spark 1.6 onwards. There is no much changes after Unified. Spark 3 has some changes in memory management.

Online	Offline
Last Visited	‎08-29-2024 03:41 AM

Member Since	‎06-02-2020 05:25 AM
Last Visited	‎08-29-2024 03:41 AM
Posts	331
Kudos received	68

Cloudera Community

Re: Icebreg on CDP private cloud 7.1.9

Re: How to set default time zone/local time for Sp...

Re: Load Iceberg Table on PowerBI Desktop

Re: NoClassDefFoundError due to Incompatible Spark...

Re: Creating Iceberg table

Re: DBT with Spark HWC

Re: Unable to create new Cloudera case

Re: ERROR SparkContext: Error initializing SparkCo...

Re: ERROR SparkContext: Error initializing SparkCo...

Re: spark.sql.sources.partitionOverwriteMode=dynam...

Re: Parquet as Array[Byte] to DataFrame without wr...

Re: Apache Livy Installation and Configuration

Running DirectKafkaWordCount example in CDP

Re: upgrading spark to 2.4

Re: upgrading spark to 2.4