Support Questions
Find answers, ask questions, and share your expertise

Save Kafka-Spark Streaming messages into single file

Highlighted

Save Kafka-Spark Streaming messages into single file

Hi All,

As per my requirement i have developed an application which will consumes the messages using Kafka- Spark Streaming process.

Once the data is received it will be saved into HDFS.

The streaming data is saved as multiple files.

My requirement is to append the data to an existing file in HDFS untill the block size exceeds.

Please help me with the solution.

Thanks in advance

Regards,

Vijay

3 REPLIES 3
Highlighted

Re: Save Kafka-Spark Streaming messages into single file

Contributor

Hi Vijay,

did you try dataframe.write().mode(SaveMode.Append) ?

this should allow the data to be appended rather than writing a new file everytime...

hope this works for you.

Highlighted

Re: Save Kafka-Spark Streaming messages into single file

@Ned Shawa

I did try that but no luck.

Highlighted

Re: Save Kafka-Spark Streaming messages into single file

Hi Vijay,try

1.Coalesce(1),mode(SaveMode.Append) it might help. But I am not sure whether you can use the same for huge amounts of Data.

2.CopyMerge also works to write the different files into a single file and you might get a performance issue if you are having huge amounts of data from SparkStreaming.

i don't think there is an feasible solution for this.