Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

pyspark using Spark 2.3

Solved Go to solution

pyspark using Spark 2.3

Explorer

We're on CDM 5.14.2 with CDH 5.13.3

Both Spark 1.6 and Spark 2.3.3 are installed (some apps are still using Spark 1.6, can't remove it yet)

Now when I'm starting pyspark with config file for Spark2, it still runs pyspark with Spark 1.6

e.g.

pyspark --properties-file /etc/spark2/conf/spark-defaults.conf

it shows after the ASCII Spark logo: version 1.6.0

using verbose mode it shows the paths are pointing to Spark 2

spark.yarn.jars,local:/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2/jars/*

 

Why is pyspark still referring to Spark 1.6 ?

How can I force it to use spark 2.3.3 ?

 

2 ACCEPTED SOLUTIONS

Accepted Solutions
Highlighted

Re: pyspark using Spark 2.3

Contributor

Hey,

 

Can you please try setting the SPARK_HOME env variable to the location indicated by the readlink command it launches pyspark and shows Spark 2.0 as the version?

 

For Example:

export SPARK_HOME=/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2

 

By setting SPARK_HOME to the Spark 2 lib folder instead, pyspark2 will then launch and show Spark 2.3.0.cloudera3 as the spark version.

 

Please let me know if this helps.

 

Regards,

Ankit.

View solution in original post

Highlighted

Re: pyspark using Spark 2.3

Explorer

thanks, that put me in the right direction

for completeness, just setting SPARK_HOME was not sufficient, it was missing py4j

setting PYTHONPATH fixed that issue

export SPARK_HOME=/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.7-src.zip:$PYTHONPATH

 

Now pyspark shows: version 2.3.0.cloudera3

View solution in original post

2 REPLIES 2
Highlighted

Re: pyspark using Spark 2.3

Contributor

Hey,

 

Can you please try setting the SPARK_HOME env variable to the location indicated by the readlink command it launches pyspark and shows Spark 2.0 as the version?

 

For Example:

export SPARK_HOME=/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2

 

By setting SPARK_HOME to the Spark 2 lib folder instead, pyspark2 will then launch and show Spark 2.3.0.cloudera3 as the spark version.

 

Please let me know if this helps.

 

Regards,

Ankit.

View solution in original post

Highlighted

Re: pyspark using Spark 2.3

Explorer

thanks, that put me in the right direction

for completeness, just setting SPARK_HOME was not sufficient, it was missing py4j

setting PYTHONPATH fixed that issue

export SPARK_HOME=/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.7-src.zip:$PYTHONPATH

 

Now pyspark shows: version 2.3.0.cloudera3

View solution in original post

Don't have an account?
Coming from Hortonworks? Activate your account here