Support Questions

Find answers, ask questions, and share your expertise
Celebrating as our community reaches 100,000 members! Thank you!

Spark-submit error with Python3 on Hortonworks sandbox VM.


Hi I've installed anaconda on my Hortonworks sandbox VM to try to use Python3. When I run a spark-submit file I'm met with this error:

[root@sandbox ~]# spark-submit
SPARK_MAJOR_VERSION is set to 2, using Spark2
  File "/usr/bin/hdp-select", line 205
    print "ERROR: Invalid package - " + name
SyntaxError: Missing parentheses in call to 'print'. Did you mean print("ERROR: Invalid package - " + name)?
ls: cannot access /usr/hdp//hadoop/lib: No such file or directory
Exception in thread "main" java.lang.IllegalStateException: hdp.version is not set while running Spark under HDP, please set through HDP_VERSION in or add a java-opts file in conf with -Dhdp.version=xxx

I've set these environment variables:

export PYSPARK_PYTHON=/root/miniconda3/bin/python

I suspect from some other posts this error may have something to do with me trying to use Python 3 with spark submit? If this is the problem, is there a trick to make this work?

[root@sandbox ~]# which python
[root@sandbox ~]# python -V
Python 3.6.5 :: Anaconda, Inc.
[root@sandbox ~]# 



Master Mentor

@Alex Witte

You should be using Python 2.7.x instead of using Python3. Please find the Python certified version with HDP :


hdp-select is a binary that comes with HDP installation and this uses "print" python function which will require slight change in python3. Thats why we are getting:

SyntaxError:Missing parentheses in call to 'print'.


Can you please try setting the "HDP_VERSION" in and then try again.