Created on 08-29-2017 10:05 AM - edited 09-16-2022 05:10 AM
Hi,
How do I store Spark streaming data into:
1. HDFS
2. Kudu
I am following below example:
https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/kafka_wordcount.py
I am using Spark 2.2 (also have Spark 1.6 installed). I am using Spark Streaming with Kafka where Spark streaming is acting as a consumer.
Can you please tell how to store Spark Streaming data into HDFS using:
1. Spark Streaming
2. Structured Streaming
I am using pyspark.
Thanks you.
Created 08-31-2017 05:08 AM
I don't think there is Kudu support yet in Pyspark. see KUDU-1603
Created 09-07-2017 11:01 AM
How do I store spark structured streaming data into HDFS?
Created 09-07-2017 03:16 PM
Hi,
You need to generate an RDD of structured data and write it to HDFS. Sample code in java is as follows,
records.foreachRDD(new VoidFunction2<JavaRDD<String>, Time>() { private static final long serialVersionUID = 1L; @Override public void call(JavaRDD<String> rdd, Time time) throws Exception { if (rdd.count() > 0) { rdd.saveAsTextFile(outputPath + "/" + time.milliseconds()); } } });
Hope this helps.
Thanks,
Ravi
Created on 11-30-2017 09:18 AM - edited 11-30-2017 09:19 AM
God bless you, munna143