Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

MAPREDUE code VS sequential code?

Explorer

I have developed a program code that executes an X task in sequential and in MAPREDUCE model. But I see that the running time taken by the mapreduce is bigger than the sequential one( It must be the opposite!!). I'm running mapreduce in a single node, so at least I should get the same running time?

1 ACCEPTED SOLUTION

Expert Contributor

If you are running MapReduce on a single node, it will take more time than a sequential application due to the job creation overhead that MapReduce must undertake. There is extra time taken in the case of MapReduce to submit the job, copy the code and dependencies into a YARN container, and start the job. As you scale out to several nodes and more, you will see the performance benefits of MapReduce.

In general however, MapReduce is used less often now on the platform - Hive runs on Tez now rather than MapReduce and I've only seen MapReduce of late being used for things like bulk loading data into HBase/Druid. In-memory processing, the likes of which both Hive/LLAP and Spark provide, can net you a significant performance boost depending on what you're trying to accomplish and the tool best suited for the job.

View solution in original post

1 REPLY 1

Expert Contributor

If you are running MapReduce on a single node, it will take more time than a sequential application due to the job creation overhead that MapReduce must undertake. There is extra time taken in the case of MapReduce to submit the job, copy the code and dependencies into a YARN container, and start the job. As you scale out to several nodes and more, you will see the performance benefits of MapReduce.

In general however, MapReduce is used less often now on the platform - Hive runs on Tez now rather than MapReduce and I've only seen MapReduce of late being used for things like bulk loading data into HBase/Druid. In-memory processing, the likes of which both Hive/LLAP and Spark provide, can net you a significant performance boost depending on what you're trying to accomplish and the tool best suited for the job.