Support Questions
Find answers, ask questions, and share your expertise
Alert: Please see the Cloudera blog for information on the Cloudera Response to CVE-2021-4428

Save Kafka-Spark Streaming messages into single file

Hi All,

As per my requirement i have developed an application which will consumes the messages using Kafka- Spark Streaming process.

Once the data is received it will be saved into HDFS.

The streaming data is saved as multiple files.

My requirement is to append the data to an existing file in HDFS untill the block size exceeds.

Please help me with the solution.

Thanks in advance





Hi Vijay,

did you try dataframe.write().mode(SaveMode.Append) ?

this should allow the data to be appended rather than writing a new file everytime...

hope this works for you.

@Ned Shawa

I did try that but no luck.

Hi Vijay,try

1.Coalesce(1),mode(SaveMode.Append) it might help. But I am not sure whether you can use the same for huge amounts of Data.

2.CopyMerge also works to write the different files into a single file and you might get a performance issue if you are having huge amounts of data from SparkStreaming.

i don't think there is an feasible solution for this.