Support Questions

Find answers, ask questions, and share your expertise

Spark in yarn-cluster mode on Zeppelin

avatar
Contributor

By default with ambari installation, Zeppelin is set to have yarn client mode for Spark Interpreter which means the driver runs in the same host of Zeppelin Server. This incur high memory pressure on the Zeppelin Server host especially when Spark Interpreter is ran in isolated mode.

I'm trying to switch to yarn-cluster mode which would let yarn decide on where spark driver should be executed depending of the available resources in the cluster. This mode is supported by Zeppelin since the version 0.8.0 but I'm facing the following issue https://issues.apache.org/jira/browse/ZEPPELIN-3633. Basically, the node where yarn decided to run spark driver doesn't have zeppelin installed so is unable to start.

There is a fix on Zeppelin's github https://github.com/apache/zeppelin/pull/3181 but I can't find the files that I need to change. Any chance that this can be fixed easily or should I just install zeppelin on every nodes?

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Jeremy Jean-Jean

There is no sense in installing zeppelin on all the nodes, Do you have YARN Client installed on the data nodes? Then submit using

spark-submit --class <clasname> --master yarn --deploy-mode cluster <jars> <args> 


HTH

View solution in original post

2 REPLIES 2

avatar
Master Mentor

@Jeremy Jean-Jean

There is no sense in installing zeppelin on all the nodes, Do you have YARN Client installed on the data nodes? Then submit using

spark-submit --class <clasname> --master yarn --deploy-mode cluster <jars> <args> 


HTH

avatar
Contributor

Thank you for your fast answer!

Indeed it works after tweaking zeppelin's spark interpreter parameters

and changing:

master: yarn-cluster

to

master: yarn
spark.submit.deployMode: cluster

97630-spark.jpg