Created 12-01-2016 10:15 PM
Created 12-15-2016 09:23 PM
I suggest looking at the merge and saveAsTextFile functions as per bottom post here http://stackoverflow.com/questions/31666361/process-spark-streaming-rdd-and-store-to-single-hdfs-fil...
Created 12-04-2016 12:30 AM
@Vijay Kumar J any idea ? thanks in advance
Created 12-15-2016 04:48 PM
@Greg Keys any idea ? thanks in advance
Created 12-15-2016 09:23 PM
I suggest looking at the merge and saveAsTextFile functions as per bottom post here http://stackoverflow.com/questions/31666361/process-spark-streaming-rdd-and-store-to-single-hdfs-fil...
Created 12-15-2016 09:58 PM
Hi Greg Keys, thanks for the reply i was using the similar approach, but wondering whether this approach works if spark streaming processing the data in giga bytes ?
Created 12-16-2016 01:47 PM
That is really an issue of scaling (how many nodes and memory per node you have) and multitenancy (which other jobs will run at the same time, particularly spark or other memory-intensive jobs). The more nodes and the less memory contention, the more data you can process in spark.
Created 12-16-2016 06:07 PM
i am working on 12 node cluster with 4 having 126 gigs, 8 having 252 gigs memory.
Created 12-16-2016 06:09 PM
What is the largest load (MBs or GBs) you have run your use case on?