Support Questions

Find answers, ask questions, and share your expertise
Announcements
Now Live: Explore expert insights and technical deep dives on the new Cloudera Community BlogsRead the Announcement

how to append files in writing to hdfs from spark?

avatar
New Member

This is my code:

JavaPairInputDStream<String, String> messages = KafkaUtils.createDirectStream(jssc, String.class, String.class, StringDecoder.class, StringDecoder.class, kafkaParams, topicsSet);

JavaDStream<String> lines = messages.map(new Function<Tuple2<String, String>, String>() { @Override public String call(Tuple2<String, String> tuple2) { return tuple2._2(); } });

lines.dstream().saveAsTextFiles(pathtohdfs);

this is basically generating different files every time in hdfs. I need to append all the files into one. How can I do that?

1 REPLY 1

avatar
Expert Contributor

@Apoorva Teja Vanam : It doesn't look like there is a straight forward approach to this. Have you checked : http://stackoverflow.com/questions/37017366/how-can-i-make-spark1-6-saveastextfile-to-append-existin...