Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

I want my spark Applications runnning in cluster mode to be HA using Zookeeper. How to do it? Any help? Mean Sparks job running should be running continuously on switch

Highlighted

I want my spark Applications runnning in cluster mode to be HA using Zookeeper. How to do it? Any help? Mean Sparks job running should be running continuously on switch

Explorer
 
5 REPLIES 5
Highlighted

Re: I want my spark Applications runnning in cluster mode to be HA using Zookeeper. How to do it? Any help? Mean Sparks job running should be running continuously on switch

Super Guru
Highlighted

Re: I want my spark Applications runnning in cluster mode to be HA using Zookeeper. How to do it? Any help? Mean Sparks job running should be running continuously on switch

Cloudera Employee

@Dhana Shekar

What's you cluster setup like ? Are you using Yarn? Is it setup to function in HA ? http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html

If so, then you can rely on Yarn to handle your application to run in HA and to handle all the ressource allocations for you. That's the beauty of using Yarn to run your hadoop applications.

All you need to do is tell spark to run your applications against Yarn : http://spark.apache.org/docs/latest/running-on-yarn.html

Highlighted

Re: I want my spark Applications runnning in cluster mode to be HA using Zookeeper. How to do it? Any help? Mean Sparks job running should be running continuously on switch

Explorer

Thanks Matthieu for the response. I am running spark applications on Yarn deployment mode, on a namenode which is configured already HA on Zookeeper. What I am trying to understand, what will happen to spark jobs when namenode is down/standby, I need to say to zookeeper to take care of spark applications on switch (Namenode-Active, Secondary- standby)? Whenever I search it says HA on standalone but I am running in yarn deploy mode. Is my understanding right, or something missing? Anything I need to configure spark parameters to zookeeper?

Highlighted

Re: I want my spark Applications runnning in cluster mode to be HA using Zookeeper. How to do it? Any help? Mean Sparks job running should be running continuously on switch

Explorer

You mean to say all application submitted on a namenode-Active(Machine : IP1, configured HA with zookeeper with IP2), on switch standby(Machine : IP2) will automatically migrate/start the spark job/processes/appilcations in IP2 with no specific configurations for spark?

Re: I want my spark Applications runnning in cluster mode to be HA using Zookeeper. How to do it? Any help? Mean Sparks job running should be running continuously on switch

Cloudera Employee

Hi @Dhana Shekar

Having your Namenode running in HA in not enough, your Ressource Manager (which handles Yarn management) also needs to be configured in HA (cf. first link in my answer above).

Having your namenode in HA allows you to continue to have access to HDFS in case of the failure of active NameNode. However, it doesn't handle ressource allocation for application that's what the Ressource Manager (YARN) is for.

Let's go through a few failure scenarios :

1. Ressource Manager fails but the containers (application master + slaves + driver) linked to that particular application are unaffected. => Your application continues to run as if nothing happened. You won't be able to submit new apps until the ressource Manager is back up or the standby has been brought to active (in case of HA)

2. One of the slave containers fails.

=> The application manager spawns a new container to take over. The spark task it was handling might fail but it will be replayed

3. Application Master container fails.

=> The application fails but will be re-spawned by the Ressource Manager.

Don't have an account?
Coming from Hortonworks? Activate your account here