Hi I've installed anaconda on my Hortonworks sandbox VM to try to use Python3. When I run a spark-submit file I'm met with this error:
[root@sandbox ~]# spark-submit script.py
SPARK_MAJOR_VERSION is set to 2, using Spark2
File "/usr/bin/hdp-select", line 205
print "ERROR: Invalid package - " + name
^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print("ERROR: Invalid package - " + name)?
ls: cannot access /usr/hdp//hadoop/lib: No such file or directory
Exception in thread "main" java.lang.IllegalStateException: hdp.version is not set while running Spark under HDP, please set through HDP_VERSION in spark-env.sh or add a java-opts file in conf with -Dhdp.version=xxx
I've set these environment variables:
export SPARK_MAJOR_VERSION=2
export PYSPARK_PYTHON=/root/miniconda3/bin/python
I suspect from some other posts this error may have something to do with me trying to use Python 3 with spark submit? If this is the problem, is there a trick to make this work?
[root@sandbox ~]# which python
/root/miniconda3/bin/python
[root@sandbox ~]# python -V
Python 3.6.5 :: Anaconda, Inc.
[root@sandbox ~]#
Thanks!