Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

how can i decide i use spark or Mpareduce ?

avatar
Expert Contributor

i want to know which computing engine is better in which situatio ?

Thanks.

1 ACCEPTED SOLUTION

avatar

Yes, it highly depends on your specific use case. But if you want to know in general pros and cons of each of these Frameworks, then here is a good quora thread

https://www.quora.com/What-is-the-difference-between-Apache-Spark-and-Apache-Hadoop-Map-Reduce

And also ofcourse the Stack overflow thread

http://stackoverflow.com/questions/22167684/mapreduce-or-spark

View solution in original post

7 REPLIES 7

avatar
Rising Star

This question is too board in this form.

You need to understand this: if you want to get advise on which solution (computing engine) to choose, you should give a descrption first on what you are trying to accomplish, what kind of problem are you trying to solve, what is the nature of your workload.

avatar
Expert Contributor

i am new to Hadoop. I want to know how MAPREDUCE and Spark Internally works and what is difference between them that makes Spark execution faster tha MR.

avatar
Expert Contributor

1) MR for batch processing , for loading the Data best suite, but it is slower, compare to Spark.

3) Spark for In memory processing, its faster as it process in Memory only.

avatar

Yes, it highly depends on your specific use case. But if you want to know in general pros and cons of each of these Frameworks, then here is a good quora thread

https://www.quora.com/What-is-the-difference-between-Apache-Spark-and-Apache-Hadoop-Map-Reduce

And also ofcourse the Stack overflow thread

http://stackoverflow.com/questions/22167684/mapreduce-or-spark

avatar
Expert Contributor

when i execute Select * from <table> order by <Column name> using computing engine Spark. where it performs order by means data is distributed on cluster, first it combines all selected data at one place and performs order by on multiple node and in which memory ?

avatar

@heta desai This slide deck explains you the spark internals in very simple way

https://spark-summit.org/2014/wp-content/uploads/2014/07/A-Deeper-Understanding-of-Spark-Internals-A...

Based on this , what i think is that when you do order by - first , data in each partition will be ordered first. And then to achieve universal order, the ordering among partitions would be carried out. Spark won't accumulate all data at one place because thats not possible if data is huge. Spark would try to perform all operations in memory.

Corresponding Stack overflow answer:

http://stackoverflow.com/questions/32887595/how-does-spark-achieve-sort-order

avatar
Expert Contributor

1) MR for batch processing , for loading the Data best suite, but it is slower, compare to Spark.

2) Spark for In memory processing, its faster as it process in Memory only.