Created 01-22-2019 10:24 AM
By default with ambari installation, Zeppelin is set to have yarn client mode for Spark Interpreter which means the driver runs in the same host of Zeppelin Server. This incur high memory pressure on the Zeppelin Server host especially when Spark Interpreter is ran in isolated mode.
I'm trying to switch to yarn-cluster mode which would let yarn decide on where spark driver should be executed depending of the available resources in the cluster. This mode is supported by Zeppelin since the version 0.8.0 but I'm facing the following issue https://issues.apache.org/jira/browse/ZEPPELIN-3633. Basically, the node where yarn decided to run spark driver doesn't have zeppelin installed so is unable to start.
There is a fix on Zeppelin's github https://github.com/apache/zeppelin/pull/3181 but I can't find the files that I need to change. Any chance that this can be fixed easily or should I just install zeppelin on every nodes?
Created 01-22-2019 10:55 AM
There is no sense in installing zeppelin on all the nodes, Do you have YARN Client installed on the data nodes? Then submit using
spark-submit --class <clasname> --master yarn --deploy-mode cluster <jars> <args>
HTH
Created 01-22-2019 10:55 AM
There is no sense in installing zeppelin on all the nodes, Do you have YARN Client installed on the data nodes? Then submit using
spark-submit --class <clasname> --master yarn --deploy-mode cluster <jars> <args>
HTH
Created on 01-22-2019 12:23 PM - edited 08-17-2019 02:54 PM
Thank you for your fast answer!
Indeed it works after tweaking zeppelin's spark interpreter parameters
and changing:
master: yarn-cluster
to
master: yarn spark.submit.deployMode: cluster