Options
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Rising Star
Created on 10-14-2022 06:18 AM
In a CDP environment containing both Spark2 and Spark3, Jupyter notebook will use the default path provided in the builds and will refer to spark2 in this case. We tried adding a new build in Jupyter by providing below json format file where Spark3 was copied over to CDH directory, but it did not work
cat /data1/python3.6.10/share/jupyter/kernels/pyspark3/kernel.json { "argv": [ "/data1/python3.6.10/bin/python3.6", "-m", "ipykernel_launcher", "-f", "{connection_file}" ], "display_name": "PySpark3", "language": "python", "env":{"JAVA_HOME":"/usr/java/latest","PYSPARK_PYTHON":"/data1/python3.6.10/bin/python3.6","SPARK_HOME":"/opt/cloudera/parcels/CDH/lib/spark3","HADOOP_CONF_DIR":"/opt/cloudera/parcels/CDH/lib/spark3/conf/yarn-conf","SPARK_CONF_DIR":"/opt/cloudera/parcels/CDH/lib/spark3/conf","PYTHONPATH":"/opt/cloudera/parcels/CDH/lib/spark3/python/lib/py4j-0.10.9.2-src.zip:opt/cloudera/parcels/CDH/lib/spark3/python/:","PATH":"$SPARK_HOME/bin:$JAVA_HOME/bin:$PATH" , "PYTHON_STARTUP":"/opt/cloudera/parcels/CDH/lib/spark3/python/pyspark/shell.py","CLASSPATH":"/opt/cloudera/parcels/CDH/lib/spark3/conf/yarn-conf","PYSPARK_SUBMIT_ARGS":" --py-files '/etc/hive/conf/hive-site.xml' --master yarn --name 'Jupyter Notebook' --conf spark.jars.ivy=/tmp/.ivy --queue user_prod pyspark-shell --jars /tmp/ojdbc8.jar" }
Customer was able to run Spark3 job in Jupyter using below python addition prior to script execution
import os import sys os.environ["SPARK_HOME"] = "/opt/cloudera/parcels/SPARK3/lib/spark3" os.environ["PYLIB"] = os.environ["SPARK_HOME"] + "/python/lib" os.environ["PYSPARK_DRIVER_PYTHON"] = "/data1/python3.6.10/bin/python3" sys.path.insert(0, os.environ["PYLIB"] +"/py4j-0.10.9.2-src.zip") sys.path.insert(0, os.environ["PYLIB"] +"/pyspark.zip")
These changes can apply where bash.rc file cannot be modified. It will allow Jupyter notebook to use Spark2 by default and Spark3 when above code is inserted.
1,350 Views