Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
avatar
Rising Star

In a CDP environment containing both Spark2 and Spark3, Jupyter notebook will use the default path provided in the builds and will refer to spark2 in this case. We tried adding a new build in Jupyter by providing below json format file where Spark3 was copied over to CDH directory, but it did not work

 

cat /data1/python3.6.10/share/jupyter/kernels/pyspark3/kernel.json
{
 "argv": [
  "/data1/python3.6.10/bin/python3.6",
  "-m",
  "ipykernel_launcher",
  "-f",
  "{connection_file}"
 ],
 "display_name": "PySpark3",
 "language": "python",
"env":{"JAVA_HOME":"/usr/java/latest","PYSPARK_PYTHON":"/data1/python3.6.10/bin/python3.6","SPARK_HOME":"/opt/cloudera/parcels/CDH/lib/spark3","HADOOP_CONF_DIR":"/opt/cloudera/parcels/CDH/lib/spark3/conf/yarn-conf","SPARK_CONF_DIR":"/opt/cloudera/parcels/CDH/lib/spark3/conf","PYTHONPATH":"/opt/cloudera/parcels/CDH/lib/spark3/python/lib/py4j-0.10.9.2-src.zip:opt/cloudera/parcels/CDH/lib/spark3/python/:","PATH":"$SPARK_HOME/bin:$JAVA_HOME/bin:$PATH" ,  "PYTHON_STARTUP":"/opt/cloudera/parcels/CDH/lib/spark3/python/pyspark/shell.py","CLASSPATH":"/opt/cloudera/parcels/CDH/lib/spark3/conf/yarn-conf","PYSPARK_SUBMIT_ARGS":" --py-files '/etc/hive/conf/hive-site.xml' --master yarn --name 'Jupyter Notebook' --conf spark.jars.ivy=/tmp/.ivy --queue user_prod  pyspark-shell --jars /tmp/ojdbc8.jar" }

Customer was able to run Spark3 job in Jupyter using below python addition prior to script execution

 

import os
import sys
os.environ["SPARK_HOME"] = "/opt/cloudera/parcels/SPARK3/lib/spark3"
os.environ["PYLIB"] = os.environ["SPARK_HOME"] + "/python/lib" 
os.environ["PYSPARK_DRIVER_PYTHON"] = "/data1/python3.6.10/bin/python3"
sys.path.insert(0, os.environ["PYLIB"] +"/py4j-0.10.9.2-src.zip")
sys.path.insert(0, os.environ["PYLIB"] +"/pyspark.zip")

These changes can apply where bash.rc file cannot be modified. It will allow Jupyter notebook to use Spark2 by default and Spark3 when above code is inserted.

 

1,078 Views
Version history
Last update:
‎10-14-2022 06:18 AM
Updated by:
Contributors