Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

how to choose which version of spark be used in HDP 2.5?

avatar
Explorer

There are two versions of Spark in HDP 2.5, Spark 1.6 and Spark 2.0. I don't know how I can specify the version of Spark to be used. Can anyone advise me how to do that? Ambari admin console?

Also I would like to submit job to Spark 2.0 from my application instead of spark-submit. What should I specify for the master url in the new SparkSession?

Thanks.

Donald

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Hi @yong yang

  • By default, if more than one version of Spark is installed on a node, your job runs with the default version for your HDP package.

    The default version for HDP 2.5.0 is Spark 1.6.2.

  • If more than one version of Spark is installed on a node, you can select which version of Spark runs your job.

    To do this, set the SPARK_MAJOR_VERSION environment variable to the desired version before you launch the job.

Here is an example for a user who submits jobs using spark-submit under /usr/bin:

  1. Navigate to a host where Spark 2.0 is installed.
  2. Change to the Spark2 client directory:

    cd /usr/hdp/current/spark2-client/

  3. Set the SPARK_MAJOR_VERSION environment variable to 2:

    export SPARK_MAJOR_VERSION=2

  4. Run the Spark Pi example:

    ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 1 --driver-memory 512m --executor-memory 512m --executor-cores 1 examples/jars/spark-examples*.jar 10

View solution in original post

6 REPLIES 6

avatar
Super Collaborator

avatar
Expert Contributor

Hi @yong yang

  • By default, if more than one version of Spark is installed on a node, your job runs with the default version for your HDP package.

    The default version for HDP 2.5.0 is Spark 1.6.2.

  • If more than one version of Spark is installed on a node, you can select which version of Spark runs your job.

    To do this, set the SPARK_MAJOR_VERSION environment variable to the desired version before you launch the job.

Here is an example for a user who submits jobs using spark-submit under /usr/bin:

  1. Navigate to a host where Spark 2.0 is installed.
  2. Change to the Spark2 client directory:

    cd /usr/hdp/current/spark2-client/

  3. Set the SPARK_MAJOR_VERSION environment variable to 2:

    export SPARK_MAJOR_VERSION=2

  4. Run the Spark Pi example:

    ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 1 --driver-memory 512m --executor-memory 512m --executor-cores 1 examples/jars/spark-examples*.jar 10

avatar
New Contributor

Hi guys,

I still don't get the point of specifying the variable while you provide entire path to the spark2 client. Could you please give me a reason for doing so?

On HDP 2.6.2 I use there is enough to specify a path to appropriate spark client and then the version is chosen automatically.

avatar
Explorer

Many thanks for your reply. Is it possible to change the default Spark version from 1.6.2 to 2.0 for the whole hadoop cluster from Ambari by setting SPARK_MAJOR_VERSION?

avatar
Expert Contributor

Hi @yong yang.

The SPARK_MAJOR_VERSION environment variable can be set by any user who logs on to a client machine to run Spark. The scope of the environment variable is local to the user session. Maybe in later version we may get option to to do this from ambari. Please do accept the answer, so it may useful to others too. Thx

avatar
Explorer

Many thanks for all your replies. Now I know how to specify the version of Spark to be used.