We're on CDM 5.14.2 with CDH 5.13.3
Both Spark 1.6 and Spark 2.3.3 are installed (some apps are still using Spark 1.6, can't remove it yet)
Now when I'm starting pyspark with config file for Spark2, it still runs pyspark with Spark 1.6
e.g.
pyspark --properties-file /etc/spark2/conf/spark-defaults.conf
it shows after the ASCII Spark logo: version 1.6.0
using verbose mode it shows the paths are pointing to Spark 2
spark.yarn.jars,local:/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2/jars/*
Why is pyspark still referring to Spark 1.6 ?
How can I force it to use spark 2.3.3 ?
Created 10-17-2019 04:33 AM
Hey,
Can you please try setting the SPARK_HOME env variable to the location indicated by the readlink command it launches pyspark and shows Spark 2.0 as the version?
For Example:
export SPARK_HOME=/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2
By setting SPARK_HOME to the Spark 2 lib folder instead, pyspark2 will then launch and show Spark 2.3.0.cloudera3 as the spark version.
Please let me know if this helps.
Regards,
Ankit.
Created 10-17-2019 05:47 AM
thanks, that put me in the right direction
for completeness, just setting SPARK_HOME was not sufficient, it was missing py4j
setting PYTHONPATH fixed that issue
export SPARK_HOME=/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.7-src.zip:$PYTHONPATH
Now pyspark shows: version 2.3.0.cloudera3
Created 10-17-2019 04:33 AM
Hey,
Can you please try setting the SPARK_HOME env variable to the location indicated by the readlink command it launches pyspark and shows Spark 2.0 as the version?
For Example:
export SPARK_HOME=/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2
By setting SPARK_HOME to the Spark 2 lib folder instead, pyspark2 will then launch and show Spark 2.3.0.cloudera3 as the spark version.
Please let me know if this helps.
Regards,
Ankit.
Created 10-17-2019 05:47 AM
thanks, that put me in the right direction
for completeness, just setting SPARK_HOME was not sufficient, it was missing py4j
setting PYTHONPATH fixed that issue
export SPARK_HOME=/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.7-src.zip:$PYTHONPATH
Now pyspark shows: version 2.3.0.cloudera3