About pradeep_k_moram

LesterMartin · ‎11-01-2017

Unfortunately, it is a bit more complicated than all of that. In general, Spark is lazy executed so depending on what you do even the "temp view" tables/DataFrame(Set) may not stay around from DAG to DAG. There is an explicit cache method you can use on a DataFrame(Set), but even then you may be trying to cache something that simply won't fit in memory. No worries, Spark assumes that your DF(S)/RDD collections won't fit and it inherently handles this. I'm NOT trying to sell you on anything, but probably some deeper learnings could help you. I'm a trainer here at Hortonworks (and again, not really trying to sell you something, but pointing to a resource/opportunity) and we spend several days building up this knowledge in our https://hortonworks.com/services/training/class/hdp-developer-enterprise-spark/ class). Again, apologies for being a salesperson, but my general thought was there's still a bit more to learn for you on Spark internals that might take some more interactive ways of building up that knowledge.

Online	Offline
Last Visited	‎01-12-2018 11:28 AM

Member Since	‎10-31-2017 04:11 PM
Last Visited	‎01-12-2018 11:28 AM
Posts	2

Cloudera Community

Re: Can I use SparkSQL to do complex joins and sor...