Created 03-23-2017 05:30 AM
i want to know which computing engine is better in which situatio ?
Thanks.
Created 03-23-2017 06:01 PM
Yes, it highly depends on your specific use case. But if you want to know in general pros and cons of each of these Frameworks, then here is a good quora thread
https://www.quora.com/What-is-the-difference-between-Apache-Spark-and-Apache-Hadoop-Map-Reduce
And also ofcourse the Stack overflow thread
http://stackoverflow.com/questions/22167684/mapreduce-or-spark
Created 03-23-2017 03:54 PM
This question is too board in this form.
You need to understand this: if you want to get advise on which solution (computing engine) to choose, you should give a descrption first on what you are trying to accomplish, what kind of problem are you trying to solve, what is the nature of your workload.
Created 03-24-2017 07:03 AM
i am new to Hadoop. I want to know how MAPREDUCE and Spark Internally works and what is difference between them that makes Spark execution faster tha MR.
Created 03-24-2017 12:59 PM
1) MR for batch processing , for loading the Data best suite, but it is slower, compare to Spark.
3) Spark for In memory processing, its faster as it process in Memory only.
Created 03-23-2017 06:01 PM
Yes, it highly depends on your specific use case. But if you want to know in general pros and cons of each of these Frameworks, then here is a good quora thread
https://www.quora.com/What-is-the-difference-between-Apache-Spark-and-Apache-Hadoop-Map-Reduce
And also ofcourse the Stack overflow thread
http://stackoverflow.com/questions/22167684/mapreduce-or-spark
Created 03-24-2017 07:12 AM
when i execute Select * from <table> order by <Column name> using computing engine Spark. where it performs order by means data is distributed on cluster, first it combines all selected data at one place and performs order by on multiple node and in which memory ?
Created 03-30-2017 08:41 PM
@heta desai This slide deck explains you the spark internals in very simple way
Based on this , what i think is that when you do order by - first , data in each partition will be ordered first. And then to achieve universal order, the ordering among partitions would be carried out. Spark won't accumulate all data at one place because thats not possible if data is huge. Spark would try to perform all operations in memory.
Corresponding Stack overflow answer:
http://stackoverflow.com/questions/32887595/how-does-spark-achieve-sort-order
Created 03-24-2017 12:59 PM
1) MR for batch processing , for loading the Data best suite, but it is slower, compare to Spark.
2) Spark for In memory processing, its faster as it process in Memory only.