Support Questions

Find answers, ask questions, and share your expertise

Spark-submit error with Python3 on Hortonworks sandbox VM.

avatar

Hi I've installed anaconda on my Hortonworks sandbox VM to try to use Python3. When I run a spark-submit file I'm met with this error:

[root@sandbox ~]# spark-submit script.py
SPARK_MAJOR_VERSION is set to 2, using Spark2
  File "/usr/bin/hdp-select", line 205
    print "ERROR: Invalid package - " + name
                                    ^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print("ERROR: Invalid package - " + name)?
ls: cannot access /usr/hdp//hadoop/lib: No such file or directory
Exception in thread "main" java.lang.IllegalStateException: hdp.version is not set while running Spark under HDP, please set through HDP_VERSION in spark-env.sh or add a java-opts file in conf with -Dhdp.version=xxx

I've set these environment variables:

export SPARK_MAJOR_VERSION=2
export PYSPARK_PYTHON=/root/miniconda3/bin/python

I suspect from some other posts this error may have something to do with me trying to use Python 3 with spark submit? If this is the problem, is there a trick to make this work?

[root@sandbox ~]# which python
/root/miniconda3/bin/python
[root@sandbox ~]# python -V
Python 3.6.5 :: Anaconda, Inc.
[root@sandbox ~]# 

Thanks!

1 REPLY 1

avatar
Master Mentor

@Alex Witte

You should be using Python 2.7.x instead of using Python3. Please find the Python certified version with HDP :

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_support-matrices/content/ch_matrices-amb...

.

hdp-select is a binary that comes with HDP installation and this uses "print" python function which will require slight change in python3. Thats why we are getting:

SyntaxError:Missing parentheses in call to 'print'.

.

Can you please try setting the "HDP_VERSION" in spark-env.sh and then try again.