As per my requirement i have developed an application which will consumes the messages using Kafka- Spark Streaming process.
Once the data is received it will be saved into HDFS.
The streaming data is saved as multiple files.
My requirement is to append the data to an existing file in HDFS untill the block size exceeds.
Please help me with the solution.
Thanks in advance
did you try dataframe.write().mode(SaveMode.Append) ?
this should allow the data to be appended rather than writing a new file everytime...
hope this works for you.
1.Coalesce(1),mode(SaveMode.Append) it might help. But I am not sure whether you can use the same for huge amounts of Data.
2.CopyMerge also works to write the different files into a single file and you might get a performance issue if you are having huge amounts of data from SparkStreaming.
i don't think there is an feasible solution for this.