Member since
03-12-2017
2
Posts
1
Kudos Received
0
Solutions
03-16-2017
03:25 AM
This is my code: JavaPairInputDStream<String, String> messages = KafkaUtils.createDirectStream(jssc, String.class, String.class,
StringDecoder.class, StringDecoder.class, kafkaParams, topicsSet);
JavaDStream<String> lines = messages.map(new Function<Tuple2<String, String>, String>() {
@Override
public String call(Tuple2<String, String> tuple2) {
return tuple2._2();
}
}); lines.dstream().saveAsTextFiles(pathtohdfs); this is basically generating different files every time in hdfs. I need to append all the files into one. How can I do that?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Spark
03-12-2017
01:33 AM
1 Kudo
I have been streaming data using spark from a kafka topic and writing it to hdfs but the problem is its creating empty partitions and not sure how to avoid it., I have been streaming data using spark from a kafka topic and writing it to hdfs but the problem is it's creating empty partitions and not sure how to avoid it. My code: JavaPairInputDStream<String, String> messages = KafkaUtils.createDirectStream(jssc, String.class, String.class,
StringDecoder.class, StringDecoder.class, kafkaParams, topicsSet);
JavaDStream<String> lines = messages.map(new Function<Tuple2<String, String>, String>() {
@Override
public String call(Tuple2<String, String> tuple2) {
return tuple2._2();
}
}); lines.print(); lines.dstream().saveAsTextFiles("pathtohdfs");
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Kafka
-
Apache Spark