Support Questions
Find answers, ask questions, and share your expertise

Storing Spark Streaming Data in HDFS and/or Kudu

Explorer

Hi,

 

How do I store Spark streaming data into:

 

1. HDFS

2. Kudu

 

I am following below example:

https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/kafka_wordcount.py

 

I am using Spark 2.2 (also have Spark 1.6 installed). I am using Spark Streaming with Kafka where Spark streaming is acting as a consumer.

 

Can you please tell how to store Spark Streaming data into HDFS using:

1. Spark Streaming

2. Structured Streaming

 

I am using pyspark.

 

 

 

Thanks you.

4 REPLIES 4

Explorer

I don't think there is Kudu support yet in Pyspark. see KUDU-1603

 

Explorer

How do I store spark structured streaming data into HDFS? 

Explorer

Hi,

 

You need to generate an RDD of structured data and write it to HDFS. Sample code in java is as follows,

 

records.foreachRDD(new VoidFunction2<JavaRDD<String>, Time>() {
private static final long serialVersionUID = 1L;

@Override
public void call(JavaRDD<String> rdd, Time time) throws Exception {
if (rdd.count() > 0) {
rdd.saveAsTextFile(outputPath + "/" + time.milliseconds());
}
}
});

Hope this helps.

 

Thanks,

Ravi

New Contributor

God bless you, munna143