Hi I've installed anaconda on my Hortonworks sandbox VM to try to use Python3. When I run a spark-submit file I'm met with this error:
[root@sandbox ~]# spark-submit script.py SPARK_MAJOR_VERSION is set to 2, using Spark2 File "/usr/bin/hdp-select", line 205 print "ERROR: Invalid package - " + name ^ SyntaxError: Missing parentheses in call to 'print'. Did you mean print("ERROR: Invalid package - " + name)? ls: cannot access /usr/hdp//hadoop/lib: No such file or directory Exception in thread "main" java.lang.IllegalStateException: hdp.version is not set while running Spark under HDP, please set through HDP_VERSION in spark-env.sh or add a java-opts file in conf with -Dhdp.version=xxx
I've set these environment variables:
export SPARK_MAJOR_VERSION=2 export PYSPARK_PYTHON=/root/miniconda3/bin/python
I suspect from some other posts this error may have something to do with me trying to use Python 3 with spark submit? If this is the problem, is there a trick to make this work?
[root@sandbox ~]# which python /root/miniconda3/bin/python [root@sandbox ~]# python -V Python 3.6.5 :: Anaconda, Inc. [root@sandbox ~]#
You should be using Python 2.7.x instead of using Python3. Please find the Python certified version with HDP :
hdp-select is a binary that comes with HDP installation and this uses "print" python function which will require slight change in python3. Thats why we are getting:
SyntaxError:Missing parentheses in call to 'print'.
Can you please try setting the "HDP_VERSION" in spark-env.sh and then try again.