Cannot get pyspark to work (Creating Spark Context) with FileNotFoundError: [Errno 2] No such file or directory: '/usr/hdp/current/spark-client/./bin/spark-submit'

Boron — Wed, 31 Aug 2022 12:44:38 GMT

I am using the cloudera hortonworks sandbox docker image, and have followed this tutorial to run Jupyter notebooks: https://community.cloudera.com/t5/Support-Questions/Installing-Jupyter-on-sandbox/td-p/201683

This works. The notebook is started using the python kernal. The error is encountered when attempting to create the spark context:

FileNotFoundError: [Errno 2] No such file or directory: '/usr/hdp/current/spark-client/./bin/spark-submit'

FileNotFoundError Traceback (most recent call last) <ipython-input-4-fbb9eeb69493> in <module> ----> 1 spark = SparkSession.builder.master("local").appName("myApp").getOrCreate() /usr/local/lib/python3.6/site-packages/pyspark/sql/session.py in getOrCreate(self) 226 sparkConf.set(key, value) 227 # This SparkContext may be an existing one. --> 228 sc = SparkContext.getOrCreate(sparkConf) 229 # Do not update `SparkConf` for existing `SparkContext`, as it's shared 230 # by all sessions. /usr/local/lib/python3.6/site-packages/pyspark/context.py in getOrCreate(cls, conf) 390 with SparkContext._lock: 391 if SparkContext._active_spark_context is None: --> 392 SparkContext(conf=conf or SparkConf()) 393 return SparkContext._active_spark_context 394 /usr/local/lib/python3.6/site-packages/pyspark/context.py in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls) 142 " is not allowed as it is a security risk.") 143 --> 144 SparkContext._ensure_initialized(self, gateway=gateway, conf=conf) 145 try: 146 self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer, /usr/local/lib/python3.6/site-packages/pyspark/context.py in _ensure_initialized(cls, instance, gateway, conf) 337 with SparkContext._lock: 338 if not SparkContext._gateway: --> 339 SparkContext._gateway = gateway or launch_gateway(conf) 340 SparkContext._jvm = SparkContext._gateway.jvm 341 /usr/local/lib/python3.6/site-packages/pyspark/java_gateway.py in launch_gateway(conf, popen_kwargs) 96 signal.signal(signal.SIGINT, signal.SIG_IGN) 97 popen_kwargs['preexec_fn'] = preexec_func ---> 98 proc = Popen(command, **popen_kwargs) 99 else: 100 # preexec_fn not supported on Windows /usr/lib64/python3.6/subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors) 727 c2pread, c2pwrite, 728 errread, errwrite, --> 729 restore_signals, start_new_session) 730 except: 731 # Cleanup if the child failed starting. /usr/lib64/python3.6/subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, start_new_session) 1362 if errno_num == errno.ENOENT: 1363 err_msg += ': ' + repr(err_filename) -> 1364 raise child_exception_type(errno_num, err_msg, err_filename) 1365 raise child_exception_type(err_msg) 1366 FileNotFoundError: [Errno 2] No such file or directory: '/usr/hdp/current/spark-client/./bin/spark-submit': '/usr/hdp/current/spark-client/./bin/spark-submit'

I think the problem might be connected to the environment variables, but as a novice I don't know.

Global:

HOSTNAME=sandbox-hdp.hortonworks.com TERM=xterm PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin PWD=/ SHLVL=1 HOME=/root container=docker _=/usr/bin/printenv

start_jupyter.sh

export SPARK_HOME=/usr/hdp/current/spark-client export HADOOP_HOME=/usr/hdp/current/hadoop-client export HADOOP_CONF_DIR=/usr/hdp/current/hadoop-client/conf export PYTHONPATH="/usr/hdp/current/spark-client/python:/usr/hdp/current/spark-client/python/lib/py4j-0.9-src.zip" export PYTHONSTARTUP=/usr/hdp/current/spark-client/python/pyspark/shell.py export PYSPARK_SUBMIT_ARGS="--master yarn-client pyspark-shell"

Is anyone able to point me in the right direction, so that I can create SparkContext?

Many thanks

Re: Cannot get pyspark to work (Creating Spark Context) with FileNotFoundError: [Errno 2] No such file or directory: '/usr/hdp/current/spark-client/./bin/spark-submit'

RangaReddy — Thu, 22 Sep 2022 05:22:11 GMT

Hi @Boron

Could you please set the spark-home environment variable like below before creating spark-session.

import os os.environ['SPARK_HOME'] = '/usr/hdp/current/spark-client'

Reference:

Re: Cannot get pyspark to work (Creating Spark Context) with FileNotFoundError: [Errno 2] No such file or directory: '/usr/hdp/current/spark-client/./bin/spark-submit'

Deepan_N — Thu, 22 Sep 2022 06:38:06 GMT

Hello @Boron
I believe you are using HDP 3.x. Note that there is no Spark 1.x available in HDP 3. We need to use Spark 2.x. Set the SPARK_HOME to Spark 2.

export SPARK_HOME=/usr/hdp/current/spark2-client

question Cannot get pyspark to work (Creating Spark Context) with FileNotFoundError: [Errno 2] No such file or directory: '/usr/hdp/current/spark-client/./bin/spark-submit' in Support Questions

Cannot get pyspark to work (Creating Spark Context) with FileNotFoundError: [Errno 2] No such file or directory: '/usr/hdp/current/spark-client/./bin/spark-submit'

Re: Cannot get pyspark to work (Creating Spark Context) with FileNotFoundError: [Errno 2] No such file or directory: '/usr/hdp/current/spark-client/./bin/spark-submit'

Re: Cannot get pyspark to work (Creating Spark Context) with FileNotFoundError: [Errno 2] No such file or directory: '/usr/hdp/current/spark-client/./bin/spark-submit'