Support Questions
Find answers, ask questions, and share your expertise

Spark Environment Settings - Apache Toree

Spark Environment Settings - Apache Toree

New Contributor

Hi Guys,

I have installed Apache Toree Kernel in Jupyter Notebook and wanted to use spark-scala from the kernel, below are my profile settings but when I start the kernel I'm getting the below error.

Starting Spark Kernel with SPARK_HOME=/usr/hdp/current/spark2-client

File "/usr/bin/hdp-select", line 242

print "ERROR: Invalid package - " + name

^

SyntaxError: Missing parentheses in call to 'print'. Did you mean print(t "ERROR: Invalid package - " + name)?

ls: cannot access /usr/hdp//hadoop/lib: No such file or directory

Exception in thread "main" java.lang.IllegalStateException: hdp.version is not set while running Spark under HDP, please set through HDP_VERSION in spark-env.sh or add a java-opts file in conf with -Dhdp.version=xxx

at org.apache.spark.launcher.Main.main(Main.java:118)

Environment variable set in profile file

setenv PYTHON /usr/local/anaconda3/bin 

setenv JAVA /usr/local/java/jdk_8u5_x64/bin/java 

setenv SPARK_HOME /usr/hdp/current/spark2-client 

setenv PATH ${PATH}:${SPARK_HOME}/bin 

setenv PYSPARK_PYTHON usr/local/anaconda3/bin/python
setenv PATH ${PATH}:${PYTHON} 

setenv SPARK_MAJOR_VERSION 2

When I manually start spark-shell or pyspark everything works and spark context is created but when I create a new Torre -Scala notebook it fails to initialize the spark session failed with the above mentioned error. Is this python2 to python3 version error ? Actually I would like to understand how this is all related.

Appreciate the help.