Support Questions

jeremyjjea · ‎01-22-2019

By default with ambari installation, Zeppelin is set to have yarn client mode for Spark Interpreter which means the driver runs in the same host of Zeppelin Server. This incur high memory pressure on the Zeppelin Server host especially when Spark Interpreter is ran in isolated mode.

I'm trying to switch to yarn-cluster mode which would let yarn decide on where spark driver should be executed depending of the available resources in the cluster. This mode is supported by Zeppelin since the version 0.8.0 but I'm facing the following issue https://issues.apache.org/jira/browse/ZEPPELIN-3633. Basically, the node where yarn decided to run spark driver doesn't have zeppelin installed so is unable to start.

There is a fix on Zeppelin's github https://github.com/apache/zeppelin/pull/3181 but I can't find the files that I need to change. Any chance that this can be fixed easily or should I just install zeppelin on every nodes?

Shelton · ‎01-22-2019

@Jeremy Jean-Jean

There is no sense in installing zeppelin on all the nodes, Do you have YARN Client installed on the data nodes? Then submit using

spark-submit --class <clasname> --master yarn --deploy-mode cluster <jars> <args>

HTH

View solution in original post

Shelton · ‎01-22-2019

@Jeremy Jean-Jean

There is no sense in installing zeppelin on all the nodes, Do you have YARN Client installed on the data nodes? Then submit using

spark-submit --class <clasname> --master yarn --deploy-mode cluster <jars> <args>

HTH

jeremyjjea · ‎01-22-2019

Thank you for your fast answer!

Indeed it works after tweaking zeppelin's spark interpreter parameters

and changing:

master: yarn-cluster

to

master: yarn
spark.submit.deployMode: cluster

Cloudera Community

Support Questions

Spark in yarn-cluster mode on Zeppelin