Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

how to choose which version of spark be used in HDP 2.5?

Solved Go to solution

how to choose which version of spark be used in HDP 2.5?

New Contributor

There are two versions of Spark in HDP 2.5, Spark 1.6 and Spark 2.0. I don't know how I can specify the version of Spark to be used. Can anyone advise me how to do that? Ambari admin console?

Also I would like to submit job to Spark 2.0 from my application instead of spark-submit. What should I specify for the master url in the new SparkSession?

Thanks.

Donald

1 ACCEPTED SOLUTION

Accepted Solutions

Re: how to choose which version of spark be used in HDP 2.5?

Rising Star

Hi @yong yang

  • By default, if more than one version of Spark is installed on a node, your job runs with the default version for your HDP package.

    The default version for HDP 2.5.0 is Spark 1.6.2.

  • If more than one version of Spark is installed on a node, you can select which version of Spark runs your job.

    To do this, set the SPARK_MAJOR_VERSION environment variable to the desired version before you launch the job.

Here is an example for a user who submits jobs using spark-submit under /usr/bin:

  1. Navigate to a host where Spark 2.0 is installed.
  2. Change to the Spark2 client directory:

    cd /usr/hdp/current/spark2-client/

  3. Set the SPARK_MAJOR_VERSION environment variable to 2:

    export SPARK_MAJOR_VERSION=2

  4. Run the Spark Pi example:

    ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 1 --driver-memory 512m --executor-memory 512m --executor-cores 1 examples/jars/spark-examples*.jar 10

6 REPLIES 6
Highlighted

Re: how to choose which version of spark be used in HDP 2.5?

Expert Contributor

Re: how to choose which version of spark be used in HDP 2.5?

Rising Star

Hi @yong yang

  • By default, if more than one version of Spark is installed on a node, your job runs with the default version for your HDP package.

    The default version for HDP 2.5.0 is Spark 1.6.2.

  • If more than one version of Spark is installed on a node, you can select which version of Spark runs your job.

    To do this, set the SPARK_MAJOR_VERSION environment variable to the desired version before you launch the job.

Here is an example for a user who submits jobs using spark-submit under /usr/bin:

  1. Navigate to a host where Spark 2.0 is installed.
  2. Change to the Spark2 client directory:

    cd /usr/hdp/current/spark2-client/

  3. Set the SPARK_MAJOR_VERSION environment variable to 2:

    export SPARK_MAJOR_VERSION=2

  4. Run the Spark Pi example:

    ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 1 --driver-memory 512m --executor-memory 512m --executor-cores 1 examples/jars/spark-examples*.jar 10

Re: how to choose which version of spark be used in HDP 2.5?

New Contributor

Hi guys,

I still don't get the point of specifying the variable while you provide entire path to the spark2 client. Could you please give me a reason for doing so?

On HDP 2.6.2 I use there is enough to specify a path to appropriate spark client and then the version is chosen automatically.

Re: how to choose which version of spark be used in HDP 2.5?

New Contributor

Many thanks for your reply. Is it possible to change the default Spark version from 1.6.2 to 2.0 for the whole hadoop cluster from Ambari by setting SPARK_MAJOR_VERSION?

Re: how to choose which version of spark be used in HDP 2.5?

Rising Star

Hi @yong yang.

The SPARK_MAJOR_VERSION environment variable can be set by any user who logs on to a client machine to run Spark. The scope of the environment variable is local to the user session. Maybe in later version we may get option to to do this from ambari. Please do accept the answer, so it may useful to others too. Thx

Re: how to choose which version of spark be used in HDP 2.5?

New Contributor

Many thanks for all your replies. Now I know how to specify the version of Spark to be used.

Don't have an account?
Coming from Hortonworks? Activate your account here