Member since
09-14-2015
3
Posts
0
Kudos Received
0
Solutions
09-14-2015
09:53 AM
In your Spark UI do you see it working with a large number of partitions (large number of tasks)? It could be that you are loading all 70G into memory at once if you have a small number of partitions. Also it could be that you have one huge partition with 99% of the data and lots of small ones. Then when Spark processes your huge partition it will load it all into memory. This can happen if you are mapping to a tuple e.g. (x, y) and the key (x) is the same for 99% of the data. Have a look at your Spark UI to see the size of the tasks you are running. It's likely that you will see a small number of tasks, or one huge task and a lot of small ones.
... View more
09-14-2015
09:01 AM
As mentioned in existing posts you can run Spark 1.4 and 1.5 on Cloudera 5.4 and it will mostly (if not completely) work. What is the Cloudera stance on supporting this activity? Will Cloudera provide any Spark support to a company that uses a newer Spark version on Cloudera 5.4?
... View more
Labels:
- Labels:
-
Apache Spark