Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

pyspark using Spark 2.3

avatar
Explorer

We're on CDM 5.14.2 with CDH 5.13.3

Both Spark 1.6 and Spark 2.3.3 are installed (some apps are still using Spark 1.6, can't remove it yet)

Now when I'm starting pyspark with config file for Spark2, it still runs pyspark with Spark 1.6

e.g.

pyspark --properties-file /etc/spark2/conf/spark-defaults.conf

it shows after the ASCII Spark logo: version 1.6.0

using verbose mode it shows the paths are pointing to Spark 2

spark.yarn.jars,local:/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2/jars/*

 

Why is pyspark still referring to Spark 1.6 ?

How can I force it to use spark 2.3.3 ?

 

2 ACCEPTED SOLUTIONS

avatar
Expert Contributor

Hey,

 

Can you please try setting the SPARK_HOME env variable to the location indicated by the readlink command it launches pyspark and shows Spark 2.0 as the version?

 

For Example:

export SPARK_HOME=/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2

 

By setting SPARK_HOME to the Spark 2 lib folder instead, pyspark2 will then launch and show Spark 2.3.0.cloudera3 as the spark version.

 

Please let me know if this helps.

 

Regards,

Ankit.

View solution in original post

avatar
Explorer

thanks, that put me in the right direction

for completeness, just setting SPARK_HOME was not sufficient, it was missing py4j

setting PYTHONPATH fixed that issue

export SPARK_HOME=/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.7-src.zip:$PYTHONPATH

 

Now pyspark shows: version 2.3.0.cloudera3

View solution in original post

2 REPLIES 2

avatar
Expert Contributor

Hey,

 

Can you please try setting the SPARK_HOME env variable to the location indicated by the readlink command it launches pyspark and shows Spark 2.0 as the version?

 

For Example:

export SPARK_HOME=/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2

 

By setting SPARK_HOME to the Spark 2 lib folder instead, pyspark2 will then launch and show Spark 2.3.0.cloudera3 as the spark version.

 

Please let me know if this helps.

 

Regards,

Ankit.

avatar
Explorer

thanks, that put me in the right direction

for completeness, just setting SPARK_HOME was not sufficient, it was missing py4j

setting PYTHONPATH fixed that issue

export SPARK_HOME=/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.7-src.zip:$PYTHONPATH

 

Now pyspark shows: version 2.3.0.cloudera3