About RangaReddy

VidyaSargur · ‎08-08-2022

@ssuja, Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.

RangaReddy · ‎07-26-2022

Hi Team, CDP uses the "org.apache.spark.internal.io.cloud.PathOutputCommitProtocol" OutputCommitter which does not support dynamicPartitionOverwrite. You can set the following parameters into your spark job. code level: spark.conf.set("spark.sql.sources.partitionOverwriteMode", "dynamic") spark.conf.set("spark.sql.parquet.output.committer.class", "org.apache.parquet.hadoop.ParquetOutputCommitter") spark.conf.set("spark.sql.sources.commitProtocolClass", "org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol") spark-submit/spark-shell: --conf spark.sql.sources.partitionOverwriteMode=dynamic --conf spark.sql.parquet.output.committer.class=org.apache.parquet.hadoop.ParquetOutputCommitter --conf spark.sql.sources.commitProtocolClass=org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol Note: If you are using S3, you can disable it by specifying spark.cloudera.s3_committers.enabled parameter. --conf spark.cloduera.s3_committers.enabled=false

RangaReddy · ‎04-29-2022

Hi @JoeR Spark will support reading files with multiple file formats like parquet, orc, json, xml, avro,csv etc. I think there is no direct mechanism to read the data from the payload. If I found a different solution, I will share it with you.

cjervis · ‎04-22-2022

@Rekasri have you resolved your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.

yagoaparecidoti · ‎04-06-2022

Hi @RangaReddy, how are you? Even though it's a community to ask questions and ask for help, if the CDH is in EOL, can't you help?

RangaReddy · ‎04-05-2022

In this post, we will learn how to create a Kafka topic and produce and consume messages from a Kafka topic. After testing the basic producer and consumer example, we will test it with Spark using spark-examples.jar file. Creating a Kafka topic: # kafka bootstrap server KAFKA_BROKERS="localhost:9092" # kafka topic name TOPIC_NAME="word_count_topic" # group name GROUP_NAME="spark-kafka-group" # creating a topic /opt/cloudera/parcels/CDH/lib/kafka/bin/kafka-topics.sh --create --topic ${TOPIC_NAME} --bootstrap-server ${KAFKA_BROKERS} # describing a topic /opt/cloudera/parcels/CDH/lib/kafka/bin/kafka-topics.sh --describe --topic ${TOPIC_NAME} --bootstrap-server ${KAFKA_BROKERS} Producing messages to Kafka topic: # producing kafka messages /opt/cloudera/parcels/CDH/lib/kafka/bin/kafka-console-producer.sh --topic ${TOPIC_NAME} --broker-list ${KAFKA_BROKERS} Consuming messages from Kafka topic: # consuming kafka messages /opt/cloudera/parcels/CDH/lib/kafka/bin/kafka-console-consumer.sh --bootstrap-server ${KAFKA_BROKERS} --group ${GROUP_NAME} --topic ${TOPIC_NAME} --from-beginning Submitting the Spark KafkaWordCount example: spark-submit \ --master yarn \ --deploy-mode client \ --packages org.apache.spark:spark-streaming-kafka-0-10_2.11:2.4.7.7.1.7.0-551 \ --repositories https://repository.cloudera.com/artifactory/cloudera-repos/ \ --class org.apache.spark.examples.streaming.DirectKafkaWordCount \ /opt/cloudera/parcels/CDH/lib/spark/examples/jars/spark-examples_*.jar ${KAFKA_BROKERS} ${GROUP_NAME} ${TOPIC_NAME}

RangaReddy · ‎02-22-2022

Hi @Rajeshhadoop I think it is the not right way to ask set of questions in single community article. Please create a new thread for any kind of questions.

araujo · ‎02-08-2022

Looking at the serialized data, that seems like the Java binary serialization protocol. It seems to me that the producer is simply writing the HashMap java object directly to Kafka, rather than using a proper serializer (Avro, JSON, String, etc.) You should look into modifying your producer so that you can properly deserialize the data that you're reading from Kafka.

AmineCHERIFI · ‎02-08-2022

hello Yes i tested without HWC and i'm getting a correct date value. Yes the HWC is not required for external table, but as we are creating a common lib for all hive tables, we prefered use HWC. Here are the steps to reproduce the problem: 1- i'm creating a csv file file.csv with those values: a;1 b;2 c;3 that i save in this path /mypath/db/test/dt=2021-02-10. 2- create an external database : CREATE EXTERNAL TABLE db.test ( col1 string, col2 string) PARTITIONED BY (dt date) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\073' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/mypath/db/test'; 3- start a spark shell with HWC import com.hortonworks.hwc.HiveWarehouseSession val hive=HiveWarehouseSession.session(spark).build() hive.table("db.test").select("dt").distinct().show() +--------------+ | dt| +--------------+ |2021-02-09| +--------------+

RangaReddy · ‎02-08-2022

Hi @kanikach I think we don't have mechanism to tell what are all the changes is happen in current release vs previous release other than approaching to the engineering team. If you want more details changes better you can raise a cloudera case and we will check with eng team and get back to you.

Online	Offline
Last Visited	‎08-29-2024 03:41 AM

Member Since	‎06-02-2020 05:25 AM
Last Visited	‎08-29-2024 03:41 AM
Posts	331
Kudos received	68

Cloudera Community

Re: Icebreg on CDP private cloud 7.1.9

Re: How to set default time zone/local time for Sp...

Re: Load Iceberg Table on PowerBI Desktop

Re: NoClassDefFoundError due to Incompatible Spark...

Re: Creating Iceberg table

Re: ERROR SparkContext: Error initializing SparkCo...

Re: spark.sql.sources.partitionOverwriteMode=dynam...

Re: Parquet as Array[Byte] to DataFrame without wr...

Re: Streaming job does not get triggered - Promise...

Re: Apache Livy Installation and Configuration

Running DirectKafkaWordCount example in CDP

Re: upgrading spark to 2.4

Re: Spark not showing Kafka Data Properly

Re: Hive Warehouse connector gives incorrect resul...

Re: What all changes are there in spark-atlas-conn...