Member since
06-02-2020
331
Posts
64
Kudos Received
49
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
977 | 07-11-2024 01:55 AM | |
2720 | 07-09-2024 11:18 PM | |
2414 | 07-09-2024 04:26 AM | |
1789 | 07-09-2024 03:38 AM | |
2082 | 06-05-2024 02:03 AM |
08-30-2022
04:10 AM
Hi @Asim- You can check the following link for spark and dbt integration. https://community.cloudera.com/t5/Innovation-Blog/Running-dbt-core-with-adapters-for-Hive-Spark-and-Impala/ba-p/350384
... View more
08-08-2022
03:40 AM
Hi @Suhas_Ganorkar We are not supporting any CDH 5.x issues. As mentioned in error it is reached EOS. Please specify different CDH version.
... View more
08-03-2022
10:53 PM
Hi @ssuja In the above screenshot, we can see clearly, java path location is not able to find. Please verify and update the java path in nodes properly.
... View more
08-03-2022
03:56 AM
Hi @ssuja Please try to run the SparkPi example and download the application logs. Once you are downloaded check is there any exceptions in the logs. spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \
--num-executors 1 \
--driver-memory 512m \
--executor-memory 512m \
--executor-cores 1 \
/usr/hdp/current/spark2-client/examples/jars/spark-examples_*.jar 10
... View more
07-26-2022
04:17 AM
1 Kudo
Hi Team, CDP uses the "org.apache.spark.internal.io.cloud.PathOutputCommitProtocol" OutputCommitter which does not support dynamicPartitionOverwrite. You can set the following parameters into your spark job. code level: spark.conf.set("spark.sql.sources.partitionOverwriteMode", "dynamic")
spark.conf.set("spark.sql.parquet.output.committer.class", "org.apache.parquet.hadoop.ParquetOutputCommitter")
spark.conf.set("spark.sql.sources.commitProtocolClass", "org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol") spark-submit/spark-shell: --conf spark.sql.sources.partitionOverwriteMode=dynamic --conf spark.sql.parquet.output.committer.class=org.apache.parquet.hadoop.ParquetOutputCommitter --conf spark.sql.sources.commitProtocolClass=org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol Note: If you are using S3, you can disable it by specifying spark.cloudera.s3_committers.enabled parameter. --conf spark.cloduera.s3_committers.enabled=false
... View more
04-29-2022
05:29 AM
Hi @JoeR Spark will support reading files with multiple file formats like parquet, orc, json, xml, avro,csv etc. I think there is no direct mechanism to read the data from the payload. If I found a different solution, I will share it with you.
... View more
04-06-2022
03:32 AM
1 Kudo
Hi @yagoaparecidoti Based on the exception, it looks like Kerberos issue. Due to End of Life (EOL) CDH 5.X and CDH 6.X cluster, we are not able to provide any solutions. You can use CDP cluster to test your scenario. It is supported for both Spark2 and Spark3 as well.
... View more
04-05-2022
12:46 AM
1 Kudo
In this post, we will learn how to create a Kafka topic and produce and consume messages from a Kafka topic. After testing the basic producer and consumer example, we will test it with Spark using spark-examples.jar file.
Creating a Kafka topic:
# kafka bootstrap server
KAFKA_BROKERS="localhost:9092"
# kafka topic name
TOPIC_NAME="word_count_topic"
# group name
GROUP_NAME="spark-kafka-group"
# creating a topic
/opt/cloudera/parcels/CDH/lib/kafka/bin/kafka-topics.sh --create --topic ${TOPIC_NAME} --bootstrap-server ${KAFKA_BROKERS}
# describing a topic
/opt/cloudera/parcels/CDH/lib/kafka/bin/kafka-topics.sh --describe --topic ${TOPIC_NAME} --bootstrap-server ${KAFKA_BROKERS}
Producing messages to Kafka topic:
# producing kafka messages
/opt/cloudera/parcels/CDH/lib/kafka/bin/kafka-console-producer.sh --topic ${TOPIC_NAME} --broker-list ${KAFKA_BROKERS}
Consuming messages from Kafka topic:
# consuming kafka messages
/opt/cloudera/parcels/CDH/lib/kafka/bin/kafka-console-consumer.sh --bootstrap-server ${KAFKA_BROKERS} --group ${GROUP_NAME} --topic ${TOPIC_NAME} --from-beginning
Submitting the Spark KafkaWordCount example:
spark-submit \
--master yarn \
--deploy-mode client \
--packages org.apache.spark:spark-streaming-kafka-0-10_2.11:2.4.7.7.1.7.0-551 \
--repositories https://repository.cloudera.com/artifactory/cloudera-repos/ \
--class org.apache.spark.examples.streaming.DirectKafkaWordCount \
/opt/cloudera/parcels/CDH/lib/spark/examples/jars/spark-examples_*.jar ${KAFKA_BROKERS} ${GROUP_NAME} ${TOPIC_NAME}
... View more
Labels:
02-22-2022
10:35 PM
Hi @Rajeshhadoop I think it is the not right way to ask set of questions in single community article. Please create a new thread for any kind of questions.
... View more
02-21-2022
07:39 PM
Please go through the following article. https://community.cloudera.com/t5/Community-Articles/Spark-Memory-Management/ta-p/317794 Unified Memory Manager is introduced in Spark 1.6 onwards. There is no much changes after Unified. Spark 3 has some changes in memory management.
... View more