Archives of Support Questions (Read Only)

raviteja_gogata · ‎12-01-2016

gkeys · ‎12-15-2016

I suggest looking at the merge and saveAsTextFile functions as per bottom post here http://stackoverflow.com/questions/31666361/process-spark-streaming-rdd-and-store-to-single-hdfs-fil...

View solution in original post

raviteja_gogata · ‎12-04-2016

@Vijay Kumar J any idea ? thanks in advance

raviteja_gogata · ‎12-15-2016

@Greg Keys any idea ? thanks in advance

gkeys · ‎12-15-2016

I suggest looking at the merge and saveAsTextFile functions as per bottom post here http://stackoverflow.com/questions/31666361/process-spark-streaming-rdd-and-store-to-single-hdfs-fil...

raviteja_gogata · ‎12-15-2016

Hi Greg Keys, thanks for the reply i was using the similar approach, but wondering whether this approach works if spark streaming processing the data in giga bytes ?

gkeys · ‎12-16-2016

That is really an issue of scaling (how many nodes and memory per node you have) and multitenancy (which other jobs will run at the same time, particularly spark or other memory-intensive jobs). The more nodes and the less memory contention, the more data you can process in spark.

raviteja_gogata · ‎12-16-2016

i am working on 12 node cluster with 4 having 126 gigs, 8 having 252 gigs memory.

gkeys · ‎12-16-2016

What is the largest load (MBs or GBs) you have run your use case on?

Cloudera Community

Archives of Support Questions (Read Only)

Is it possible to write the spark streaming output to single file in HDFS ? where spark streaming get's the logs from kafka topics.