I am facing some problems in using Python 3.7 with spark-submit command.
I have both Python2.7 and Python 3.7 and I create a virtualenv in order to invoke Python3.7 as interpreter. When I test my code, I simply do "spark-submit mycode.py" but I get the following error: SPARK_MAJOR_VERSION is set to 2, using
File “/usr/bin/hdp-select”, line 226
print “ERROR: Invalid package – “ + name
SyntaxError: Missing parentheses in call to ‘print’. Did you mean print(“ERROR:
Invalid package – “ + name)?
ls: cannot access /usr/hdp//adoop/lib: No such file or directory
Exception in thread “main” java.lang.IllegalStateException: hdp.version is not
set while running Spark under HDP, please set through HDP_VERSION in
spark-env.sh or add a java-opts file in conf with -Dhdp.version=xxx
org.apache.spark.launcher.Main.main(Main.java:118) I have already tried to set using --conf options the hdp version when calling spark-submit but it did not work. spark-submit --conf
"spark.driver.extraJavaOptions -Dhdp.version=22.214.171.124-8" --conf
"spark.yarn.am.extraJavaOptions -Dhdp.version=126.96.36.199-8" --conf
"spark.pyspark.python=/usr/local/bin/python3.7" --conf "spark.pyspark.driver.python=/usr/local/bin/python3.7" test2.py If I try to execute the test code outside the virtualenv (with Python 2), it works properly. I hope to figure out the problem.. Thanks Cristina
... View more