Support Questions

heta_desai · ‎03-23-2017

i want to know which computing engine is better in which situatio ?

Thanks.

kbadani · ‎03-23-2017

Yes, it highly depends on your specific use case. But if you want to know in general pros and cons of each of these Frameworks, then here is a good quora thread

https://www.quora.com/What-is-the-difference-between-Apache-Spark-and-Apache-Hadoop-Map-Reduce

And also ofcourse the Stack overflow thread

http://stackoverflow.com/questions/22167684/mapreduce-or-spark

View solution in original post

bpgergo · ‎03-23-2017

This question is too board in this form.

You need to understand this: if you want to get advise on which solution (computing engine) to choose, you should give a descrption first on what you are trying to accomplish, what kind of problem are you trying to solve, what is the nature of your workload.

heta_desai · ‎03-24-2017

i am new to Hadoop. I want to know how MAPREDUCE and Spark Internally works and what is difference between them that makes Spark execution faster tha MR.

shivkumar82015 · ‎03-24-2017

1) MR for batch processing , for loading the Data best suite, but it is slower, compare to Spark.

3) Spark for In memory processing, its faster as it process in Memory only.

kbadani · ‎03-23-2017

Yes, it highly depends on your specific use case. But if you want to know in general pros and cons of each of these Frameworks, then here is a good quora thread

https://www.quora.com/What-is-the-difference-between-Apache-Spark-and-Apache-Hadoop-Map-Reduce

And also ofcourse the Stack overflow thread

http://stackoverflow.com/questions/22167684/mapreduce-or-spark

heta_desai · ‎03-24-2017

when i execute Select * from <table> order by <Column name> using computing engine Spark. where it performs order by means data is distributed on cluster, first it combines all selected data at one place and performs order by on multiple node and in which memory ?

kbadani · ‎03-30-2017

@heta desai This slide deck explains you the spark internals in very simple way

https://spark-summit.org/2014/wp-content/uploads/2014/07/A-Deeper-Understanding-of-Spark-Internals-A...

Based on this , what i think is that when you do order by - first , data in each partition will be ordered first. And then to achieve universal order, the ordering among partitions would be carried out. Spark won't accumulate all data at one place because thats not possible if data is huge. Spark would try to perform all operations in memory.

Corresponding Stack overflow answer:

http://stackoverflow.com/questions/32887595/how-does-spark-achieve-sort-order

shivkumar82015 · ‎03-24-2017

1) MR for batch processing , for loading the Data best suite, but it is slower, compare to Spark.

2) Spark for In memory processing, its faster as it process in Memory only.

Cloudera Community

Support Questions

how can i decide i use spark or Mpareduce ?