- last edited on
We're on CDM 5.14.2 with CDH 5.13.3
Both Spark 1.6 and Spark 2.3.3 are installed (some apps are still using Spark 1.6, can't remove it yet)
Now when I'm starting pyspark with config file for Spark2, it still runs pyspark with Spark 1.6
pyspark --properties-file /etc/spark2/conf/spark-defaults.conf
it shows after the ASCII Spark logo: version 1.6.0
using verbose mode it shows the paths are pointing to Spark 2
Why is pyspark still referring to Spark 1.6 ?
How can I force it to use spark 2.3.3 ?
Can you please try setting the SPARK_HOME env variable to the location indicated by the readlink command it launches pyspark and shows Spark 2.0 as the version?
By setting SPARK_HOME to the Spark 2 lib folder instead, pyspark2 will then launch and show Spark 2.3.0.cloudera3 as the spark version.
Please let me know if this helps.
View solution in original post
thanks, that put me in the right direction
for completeness, just setting SPARK_HOME was not sufficient, it was missing py4j
setting PYTHONPATH fixed that issue
export SPARK_HOME=/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.7-src.zip:$PYTHONPATH
Now pyspark shows: version 2.3.0.cloudera3