Member since
06-02-2020
331
Posts
67
Kudos Received
49
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4105 | 07-11-2024 01:55 AM | |
| 11418 | 07-09-2024 11:18 PM | |
| 8573 | 07-09-2024 04:26 AM | |
| 8611 | 07-09-2024 03:38 AM | |
| 7513 | 06-05-2024 02:03 AM |
08-30-2022
04:33 AM
Hi @mala_etl I think you didn't mention you are running the application in CDH/HDP/CDP. Could you please share your hive script and check you are using hive catalog instead of in-memory catalog.
... View more
08-30-2022
04:31 AM
Hi @somant Please don't use open source libraries and use cluster-supported spark/kafka versions. Check the following example code: https://community.cloudera.com/t5/Community-Articles/Running-DirectKafkaWordCount-example-in-CDP/ta-p/340402
... View more
08-30-2022
04:25 AM
Hi @MikeCC Spark 3.3 is not yet supported in CDP. We have a future plan to release Spark 3.3 in CDP 7.1.8 or later versions. As per the below support matrix, we are not yet supported Java17. https://supportmatrix.cloudera.com/ I hope you have got answers for your question. If yes, please accept as a solution.
... View more
08-08-2022
03:40 AM
Hi @Suhas_Ganorkar We are not supporting any CDH 5.x issues. As mentioned in error it is reached EOS. Please specify different CDH version.
... View more
08-03-2022
10:53 PM
Hi @ssuja In the above screenshot, we can see clearly, java path location is not able to find. Please verify and update the java path in nodes properly.
... View more
08-03-2022
03:56 AM
Hi @ssuja Please try to run the SparkPi example and download the application logs. Once you are downloaded check is there any exceptions in the logs. spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \
--num-executors 1 \
--driver-memory 512m \
--executor-memory 512m \
--executor-cores 1 \
/usr/hdp/current/spark2-client/examples/jars/spark-examples_*.jar 10
... View more
07-26-2022
04:17 AM
1 Kudo
Hi Team, CDP uses the "org.apache.spark.internal.io.cloud.PathOutputCommitProtocol" OutputCommitter which does not support dynamicPartitionOverwrite. You can set the following parameters into your spark job. code level: spark.conf.set("spark.sql.sources.partitionOverwriteMode", "dynamic")
spark.conf.set("spark.sql.parquet.output.committer.class", "org.apache.parquet.hadoop.ParquetOutputCommitter")
spark.conf.set("spark.sql.sources.commitProtocolClass", "org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol") spark-submit/spark-shell: --conf spark.sql.sources.partitionOverwriteMode=dynamic --conf spark.sql.parquet.output.committer.class=org.apache.parquet.hadoop.ParquetOutputCommitter --conf spark.sql.sources.commitProtocolClass=org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol Note: If you are using S3, you can disable it by specifying spark.cloudera.s3_committers.enabled parameter. --conf spark.cloduera.s3_committers.enabled=false
... View more
04-29-2022
05:29 AM
Hi @JoeR Spark will support reading files with multiple file formats like parquet, orc, json, xml, avro,csv etc. I think there is no direct mechanism to read the data from the payload. If I found a different solution, I will share it with you.
... View more
04-05-2022
12:46 AM
1 Kudo
In this post, we will learn how to create a Kafka topic and produce and consume messages from a Kafka topic. After testing the basic producer and consumer example, we will test it with Spark using spark-examples.jar file.
Creating a Kafka topic:
# kafka bootstrap server
KAFKA_BROKERS="localhost:9092"
# kafka topic name
TOPIC_NAME="word_count_topic"
# group name
GROUP_NAME="spark-kafka-group"
# creating a topic
/opt/cloudera/parcels/CDH/lib/kafka/bin/kafka-topics.sh --create --topic ${TOPIC_NAME} --bootstrap-server ${KAFKA_BROKERS}
# describing a topic
/opt/cloudera/parcels/CDH/lib/kafka/bin/kafka-topics.sh --describe --topic ${TOPIC_NAME} --bootstrap-server ${KAFKA_BROKERS}
Producing messages to Kafka topic:
# producing kafka messages
/opt/cloudera/parcels/CDH/lib/kafka/bin/kafka-console-producer.sh --topic ${TOPIC_NAME} --broker-list ${KAFKA_BROKERS}
Consuming messages from Kafka topic:
# consuming kafka messages
/opt/cloudera/parcels/CDH/lib/kafka/bin/kafka-console-consumer.sh --bootstrap-server ${KAFKA_BROKERS} --group ${GROUP_NAME} --topic ${TOPIC_NAME} --from-beginning
Submitting the Spark KafkaWordCount example:
spark-submit \
--master yarn \
--deploy-mode client \
--packages org.apache.spark:spark-streaming-kafka-0-10_2.11:2.4.7.7.1.7.0-551 \
--repositories https://repository.cloudera.com/artifactory/cloudera-repos/ \
--class org.apache.spark.examples.streaming.DirectKafkaWordCount \
/opt/cloudera/parcels/CDH/lib/spark/examples/jars/spark-examples_*.jar ${KAFKA_BROKERS} ${GROUP_NAME} ${TOPIC_NAME}
... View more
Labels:
02-22-2022
10:35 PM
Hi @Rajeshhadoop I think it is the not right way to ask set of questions in single community article. Please create a new thread for any kind of questions.
... View more