Support Questions

donaldyy · ‎01-05-2017

There are two versions of Spark in HDP 2.5, Spark 1.6 and Spark 2.0. I don't know how I can specify the version of Spark to be used. Can anyone advise me how to do that? Ambari admin console?

Also I would like to submit job to Spark 2.0 from my application instead of spark-submit. What should I specify for the master url in the new SparkSession?

Thanks.

Donald

nyadav · ‎01-05-2017

Hi @yong yang

By default, if more than one version of Spark is installed on a node, your job runs with the default version for your HDP package.
The default version for HDP 2.5.0 is Spark 1.6.2.
If more than one version of Spark is installed on a node, you can select which version of Spark runs your job.
To do this, set the SPARK_MAJOR_VERSION environment variable to the desired version before you launch the job.

Here is an example for a user who submits jobs using spark-submit under /usr/bin:

Navigate to a host where Spark 2.0 is installed.
Change to the Spark2 client directory:
cd /usr/hdp/current/spark2-client/
Set the SPARK_MAJOR_VERSION environment variable to 2:
export SPARK_MAJOR_VERSION=2
Run the Spark Pi example:
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 1 --driver-memory 512m --executor-memory 512m --executor-cores 1 examples/jars/spark-examples*.jar 10

View solution in original post

skurup · ‎01-05-2017

@yong yang

Please refer to the link below:

http://hortonworks.com/hadoop-tutorial/a-lap-around-apache-spark/

nyadav · ‎01-05-2017

Hi @yong yang

By default, if more than one version of Spark is installed on a node, your job runs with the default version for your HDP package.
The default version for HDP 2.5.0 is Spark 1.6.2.
If more than one version of Spark is installed on a node, you can select which version of Spark runs your job.
To do this, set the SPARK_MAJOR_VERSION environment variable to the desired version before you launch the job.

Here is an example for a user who submits jobs using spark-submit under /usr/bin:

Navigate to a host where Spark 2.0 is installed.
Change to the Spark2 client directory:
cd /usr/hdp/current/spark2-client/
Set the SPARK_MAJOR_VERSION environment variable to 2:
export SPARK_MAJOR_VERSION=2
Run the Spark Pi example:
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 1 --driver-memory 512m --executor-memory 512m --executor-cores 1 examples/jars/spark-examples*.jar 10

michal_baran · ‎03-08-2018

Hi guys,

I still don't get the point of specifying the variable while you provide entire path to the spark2 client. Could you please give me a reason for doing so?

On HDP 2.6.2 I use there is enough to specify a path to appropriate spark client and then the version is chosen automatically.

donaldyy · ‎01-05-2017

Many thanks for your reply. Is it possible to change the default Spark version from 1.6.2 to 2.0 for the whole hadoop cluster from Ambari by setting SPARK_MAJOR_VERSION?

nyadav · ‎01-05-2017

Hi @yong yang.

The SPARK_MAJOR_VERSION environment variable can be set by any user who logs on to a client machine to run Spark. The scope of the environment variable is local to the user session. Maybe in later version we may get option to to do this from ambari. Please do accept the answer, so it may useful to others too. Thx

donaldyy · ‎01-06-2017

Many thanks for all your replies. Now I know how to specify the version of Spark to be used.

Cloudera Community

Support Questions

how to choose which version of spark be used in HDP 2.5?