I have been streaming data using spark from a kafka topic and writing it to hdfs but the problem is its creating empty partitions and not sure how to avoid it., I have been streaming data using spark from a kafka topic and writing it to hdfs but the problem is it's creating empty partitions and not sure how to avoid it.
My code:
JavaPairInputDStream<String, String> messages = KafkaUtils.createDirectStream(jssc, String.class, String.class,
StringDecoder.class, StringDecoder.class, kafkaParams, topicsSet);
JavaDStream<String> lines = messages.map(new Function<Tuple2<String, String>, String>() {
@Override
public String call(Tuple2<String, String> tuple2) {
return tuple2._2();
}
});
lines.print();
lines.dstream().saveAsTextFiles("pathtohdfs");