Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

pyspark using Spark 2.3

Explorer

We're on CDM 5.14.2 with CDH 5.13.3

Both Spark 1.6 and Spark 2.3.3 are installed (some apps are still using Spark 1.6, can't remove it yet)

Now when I'm starting pyspark with config file for Spark2, it still runs pyspark with Spark 1.6

e.g.

pyspark --properties-file /etc/spark2/conf/spark-defaults.conf

it shows after the ASCII Spark logo: version 1.6.0

using verbose mode it shows the paths are pointing to Spark 2

spark.yarn.jars,local:/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2/jars/*

 

Why is pyspark still referring to Spark 1.6 ?

How can I force it to use spark 2.3.3 ?

 

2 ACCEPTED SOLUTIONS

Contributor

Hey,

 

Can you please try setting the SPARK_HOME env variable to the location indicated by the readlink command it launches pyspark and shows Spark 2.0 as the version?

 

For Example:

export SPARK_HOME=/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2

 

By setting SPARK_HOME to the Spark 2 lib folder instead, pyspark2 will then launch and show Spark 2.3.0.cloudera3 as the spark version.

 

Please let me know if this helps.

 

Regards,

Ankit.

View solution in original post

Explorer

thanks, that put me in the right direction

for completeness, just setting SPARK_HOME was not sufficient, it was missing py4j

setting PYTHONPATH fixed that issue

export SPARK_HOME=/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.7-src.zip:$PYTHONPATH

 

Now pyspark shows: version 2.3.0.cloudera3

View solution in original post

2 REPLIES 2

Contributor

Hey,

 

Can you please try setting the SPARK_HOME env variable to the location indicated by the readlink command it launches pyspark and shows Spark 2.0 as the version?

 

For Example:

export SPARK_HOME=/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2

 

By setting SPARK_HOME to the Spark 2 lib folder instead, pyspark2 will then launch and show Spark 2.3.0.cloudera3 as the spark version.

 

Please let me know if this helps.

 

Regards,

Ankit.

Explorer

thanks, that put me in the right direction

for completeness, just setting SPARK_HOME was not sufficient, it was missing py4j

setting PYTHONPATH fixed that issue

export SPARK_HOME=/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.7-src.zip:$PYTHONPATH

 

Now pyspark shows: version 2.3.0.cloudera3

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.