Support Questions
Find answers, ask questions, and share your expertise

Spark in yarn-cluster mode on Zeppelin

Explorer

By default with ambari installation, Zeppelin is set to have yarn client mode for Spark Interpreter which means the driver runs in the same host of Zeppelin Server. This incur high memory pressure on the Zeppelin Server host especially when Spark Interpreter is ran in isolated mode.

I'm trying to switch to yarn-cluster mode which would let yarn decide on where spark driver should be executed depending of the available resources in the cluster. This mode is supported by Zeppelin since the version 0.8.0 but I'm facing the following issue https://issues.apache.org/jira/browse/ZEPPELIN-3633. Basically, the node where yarn decided to run spark driver doesn't have zeppelin installed so is unable to start.

There is a fix on Zeppelin's github https://github.com/apache/zeppelin/pull/3181 but I can't find the files that I need to change. Any chance that this can be fixed easily or should I just install zeppelin on every nodes?

1 ACCEPTED SOLUTION

Mentor

@Jeremy Jean-Jean

There is no sense in installing zeppelin on all the nodes, Do you have YARN Client installed on the data nodes? Then submit using

spark-submit --class <clasname> --master yarn --deploy-mode cluster <jars> <args> 


HTH

View solution in original post

2 REPLIES 2

Mentor

@Jeremy Jean-Jean

There is no sense in installing zeppelin on all the nodes, Do you have YARN Client installed on the data nodes? Then submit using

spark-submit --class <clasname> --master yarn --deploy-mode cluster <jars> <args> 


HTH

Explorer

Thank you for your fast answer!

Indeed it works after tweaking zeppelin's spark interpreter parameters

and changing:

master: yarn-cluster

to

master: yarn
spark.submit.deployMode: cluster

97630-spark.jpg

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.