Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

MAPREDUE code VS sequential code?

Solved Go to solution
Highlighted

MAPREDUE code VS sequential code?

Explorer

I have developed a program code that executes an X task in sequential and in MAPREDUCE model. But I see that the running time taken by the mapreduce is bigger than the sequential one( It must be the opposite!!). I'm running mapreduce in a single node, so at least I should get the same running time?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: MAPREDUE code VS sequential code?

Expert Contributor

If you are running MapReduce on a single node, it will take more time than a sequential application due to the job creation overhead that MapReduce must undertake. There is extra time taken in the case of MapReduce to submit the job, copy the code and dependencies into a YARN container, and start the job. As you scale out to several nodes and more, you will see the performance benefits of MapReduce.

In general however, MapReduce is used less often now on the platform - Hive runs on Tez now rather than MapReduce and I've only seen MapReduce of late being used for things like bulk loading data into HBase/Druid. In-memory processing, the likes of which both Hive/LLAP and Spark provide, can net you a significant performance boost depending on what you're trying to accomplish and the tool best suited for the job.

View solution in original post

1 REPLY 1

Re: MAPREDUE code VS sequential code?

Expert Contributor

If you are running MapReduce on a single node, it will take more time than a sequential application due to the job creation overhead that MapReduce must undertake. There is extra time taken in the case of MapReduce to submit the job, copy the code and dependencies into a YARN container, and start the job. As you scale out to several nodes and more, you will see the performance benefits of MapReduce.

In general however, MapReduce is used less often now on the platform - Hive runs on Tez now rather than MapReduce and I've only seen MapReduce of late being used for things like bulk loading data into HBase/Druid. In-memory processing, the likes of which both Hive/LLAP and Spark provide, can net you a significant performance boost depending on what you're trying to accomplish and the tool best suited for the job.

View solution in original post

Don't have an account?
Coming from Hortonworks? Activate your account here