Created 01-05-2017 02:20 PM
There are two versions of Spark in HDP 2.5, Spark 1.6 and Spark 2.0. I don't know how I can specify the version of Spark to be used. Can anyone advise me how to do that? Ambari admin console?
Also I would like to submit job to Spark 2.0 from my application instead of spark-submit. What should I specify for the master url in the new SparkSession?
Thanks.
Donald
Created 01-05-2017 02:26 PM
Hi @yong yang
The default version for HDP 2.5.0 is Spark 1.6.2.
To do this, set the SPARK_MAJOR_VERSION
environment variable to the desired version before you launch the job.
Here is an example for a user who submits jobs using spark-submit
under /usr/bin
:
cd /usr/hdp/current/spark2-client/
SPARK_MAJOR_VERSION
environment variable to 2:export SPARK_MAJOR_VERSION=2
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 1 --driver-memory 512m --executor-memory 512m --executor-cores 1 examples/jars/spark-examples*.jar 10
Created 01-05-2017 02:25 PM
Please refer to the link below:
http://hortonworks.com/hadoop-tutorial/a-lap-around-apache-spark/
Created 01-05-2017 02:26 PM
Hi @yong yang
The default version for HDP 2.5.0 is Spark 1.6.2.
To do this, set the SPARK_MAJOR_VERSION
environment variable to the desired version before you launch the job.
Here is an example for a user who submits jobs using spark-submit
under /usr/bin
:
cd /usr/hdp/current/spark2-client/
SPARK_MAJOR_VERSION
environment variable to 2:export SPARK_MAJOR_VERSION=2
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 1 --driver-memory 512m --executor-memory 512m --executor-cores 1 examples/jars/spark-examples*.jar 10
Created 03-08-2018 12:57 PM
Hi guys,
I still don't get the point of specifying the variable while you provide entire path to the spark2 client. Could you please give me a reason for doing so?
On HDP 2.6.2 I use there is enough to specify a path to appropriate spark client and then the version is chosen automatically.
Created 01-05-2017 02:39 PM
Many thanks for your reply. Is it possible to change the default Spark version from 1.6.2 to 2.0 for the whole hadoop cluster from Ambari by setting SPARK_MAJOR_VERSION?
Created 01-05-2017 03:20 PM
Hi @yong yang.
The SPARK_MAJOR_VERSION
environment variable can be set by any user who logs on to a client machine to run Spark. The scope of the environment variable is local to the user session. Maybe in later version we may get option to to do this from ambari. Please do accept the answer, so it may useful to others too. Thx
Created 01-06-2017 08:13 AM
Many thanks for all your replies. Now I know how to specify the version of Spark to be used.