Support Questions
Find answers, ask questions, and share your expertise

In how many ways can we run Spark over Hadoop?

In how many ways can we run Spark over Hadoop?


Re: In how many ways can we run Spark over Hadoop?

Rising Star

There are some options for spark run mode. ex) local, mesos, yarn

You can refer this link



Re: In how many ways can we run Spark over Hadoop?

Sparkcan either in local or distributed manner in the cluster.

1.Local mode- There is no resource manager in local mode. This mode is used for test the spark application in test environment where we do not want to eat the resources
and want to run applications faster.
Here everything run on single JVM.

2.Distributed / Cluster modes:

We can run spark on distributed manner with master-slave architecture.There will be multiple worker nodes in each cluster and cluster manager will be allocating the resources to each worked node.

Spark can be deployed in distributed cluster in 3 ways.

1.Standalone mode


In standalone mode spark itself handle the resource allocation, their won't be any separate cluster manager. Spark allocated the CPU and memory to worker nodes based on
the resource availability.


Here, YARN will be used as cluster manager. YARN distribution will be mainly used when spark running with other Hadoop components like MR in Cloudera or HortonWorks Distribution.
YARN is a combination of Resource Manager and Node Manager.

Resource manager has scheduled and Application manager.
Scheduler: Scheduler allocate resources to various running application
Application Manager: Manages all application across all nodes.

Node Manager contains Application master and container.
The container is the place where actual work happens.
Application master negotiate resources from Resource manager.

3. Mesos:

Mesos is used in large scala production deployments. In meson distribution, all the resources available in the cluster across all nodes will be clubbed together and
dynamic sharing of resources will be done.
Meson master, slave, and framework are the three components of mess.
master-provides fault tolerance
slave- actually does the resource allocation
framework-help the application to request for resources

For more information on Spark Cluster Manager read:

Cluster Manager of Spark

Don't have an account?