question Re: YARN v/s MapReduce? in Archives of Support Questions (Read Only)

YARN v/s MapReduce?

rushikeshdeshmu — Thu, 18 Feb 2016 21:19:00 GMT

Hi,

What are advantages of YARN over MapReduce, why YARN was required instead of MapReduce?

Re: YARN v/s MapReduce?

aervits — Thu, 18 Feb 2016 21:22:21 GMT

@Rushikesh Deshmukh not the same thing, I suggest you read Arun's book for best explanation http://www.amazon.com/Apache-Hadoop-YARN-MapReduce-Processing/dp/B0108CTDB6%3FSubscriptionId%3DAKIAILSHYYTFIVPWUY6Q%26tag%3Dduckduckgo-d-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3DB0108CTDB6

Re: YARN v/s MapReduce?

nsabharwal — Sun, 18 Aug 2019 13:18:30 GMT

@Rushikesh Deshmukh

Yarn provides the true multi tenancy. It lets to run multiple jobs at the same time. Yarn is the data operating system

The overall architecture is different.

YARN

MapReduce

Another link for you

Source 1 2

"You say "Differences between MapReduce and YARN". MapReduce and YARN definitely different. MapReduce is Programming Model, YARN is architecture for distribution cluster. Hadoop 2 using YARN for resource management. Besides that, hadoop support programming model which support parallel processing that we known as MapReduce. Before hadoop 2, hadoop already support MapReduce. In short, MapReduce run above YARN Architecture. Sorry, i don't mention in part of straggler problem.

"when MRmaster asks resource manger for resources?" when user submit MapReduce Job. After MapReduce job has done, resource will be back to free.

"resource manger will give MRmaster all resources it needs or it is according to cluster computing capabilities" I don't get this question point. Obviously, the resources manager will give all resource it needs no matter what cluster computing capabilities. Cluster computing capabilities will influence on processing time."

and

MRv1 uses the JobTracker to create and assign tasks to data nodes, which can become a resource bottleneck when the cluster scales out far enough (usually around 4,000 clusters).

MRv2 (aka YARN, "Yet Another Resource Negotiator") has a Resource Manager for each cluster, and each data node runs a Node Manager. For each job, one slave node will act as the Application Master, monitoring resources/tasks, etc.

Re: YARN v/s MapReduce?

rushikeshdeshmu — Thu, 18 Feb 2016 21:25:12 GMT

@Artem Ervits, thanks for suggestion and quick reply.

Re: YARN v/s MapReduce?

bleonhardi — Thu, 18 Feb 2016 21:31:03 GMT

Yarn is a work scheduler that can run different types of workloads.

- Spark

- MapReduce2

- Storm

- Tez

...

While MapReduce is a core feature and most likely the majority of the workloads its not the only one anymore. Hive/Pig uses Tez and Spark and Storm are big as well. This is the biggest advantage.

Other advantages include better scalability ( local nodemanagers instead of a single bottleneck ) lots of convenience features etc. pp.

Re: YARN v/s MapReduce?

rushikeshdeshmu — Sat, 20 Feb 2016 20:58:10 GMT

@Benjamin Leonhardi, thanks for sharing this useful information.

Re: YARN v/s MapReduce?

shivanageshch — Mon, 11 Jul 2016 18:09:52 GMT

YARN has many advantages over MapReduce (MRv1).

1) Scalability - Decreasing the load on the Resource Manager(RM) by delegating the work of handling the tasks running on slaves to application Master, RM can now handle more requests than Job tracker facilitating addition of more nodes.

2) Unlike MPv1 which is strongly coupled with the MapReduce , YARN supports many kinds of code running on them like MR2,Tez, Storm, Spark etc

3) Optimized resource allocation - There are no fixed number of slots separately allocated for Mapper and Reducers in YARN, which is the case in MRv1. So the available capacity of the nodes can be used to any task which needs resources.

4) When Resource manager fails , the jobs running on the cluster need not be restarted again after the recovery of Resource Manager.

5) Failover mechanism is implemented by ZK which is already part of Resource manager which says, we don't need to run another deamon.

Re: YARN v/s MapReduce?

sandralynn319 — Wed, 01 Feb 2017 18:14:45 GMT

This is YARN framework which is responsible for doing Cluster Resource Management.

Cluster resource management means managing the resources of the Hadoop Clusters. And by resources we mean Memory, CPU etc. YARN took over this task of cluster management from MapReduce and MapReduce is streamlined to perform Data Processing only in which it is best.

YARN has central resource manager component which manages resources and allocates the resources to the application. Multiple applications can run on Hadoop via YARN and all application could share common resource management.

Advantage of YARN:

Yarn does efficient utilization of the resource: There are no more fixed map-reduce slots. YARN provides central resource manager. With YARN, you can now run multiple applications in Hadoop, all sharing a common resource.
Yarn can even run application that do not follow MapReduce model: YARN decouples MapReduce's resource management and scheduling capabilities from the data processing component, enabling Hadoop to support more varied processing approaches and a broader array of applications. For example, Hadoop clusters can now run interactive querying and streaming data applications simultaneously with MapReduce batch jobs. This also streamlines MapReduce to do what is does best - process data.

Few Important Notes about YARN:

YARN is backward compatible: This means that existing MapReduce job can run on Hadoop 2.0 without any change.
No more JobTracker and TaskTracker needed in Hadoop 2.0: JobTracker and TaskTracker has totally disappeared. YARN splits the two major functionalities of the JobTracker i.e. resource management and job scheduling/monitoring into 2 separate daemons (components).
- Resource Manager
- Node Manager(node specific)
Central Resource Manager and node specific Node Manager together constitutes YARN.

Re: YARN v/s MapReduce?

manish555111 — Tue, 01 Aug 2017 01:56:18 GMT

@Neeraj SabharwalCan reducers communicate with each other?

Re: YARN v/s MapReduce?

sandeepksaini — Fri, 17 Nov 2017 19:24:33 GMT

Nope, reducers don't communicate with each other and neither the mappers do. All of them runs in a separate JVM containers and don't have information of each other. AppMaster is the demon which takes care and manage these JVM based containers (Mapper/Reducer).