Support Questions

Find answers, ask questions, and share your expertise

Cannot get pyspark to work (Creating Spark Context) with FileNotFoundError: [Errno 2] No such file or directory: '/usr/hdp/current/spark-client/./bin/spark-submit'

New Contributor



I am using the cloudera hortonworks sandbox docker image, and have followed this tutorial to run  Jupyter notebooks:


This works. The notebook is started using the python kernal. The error is encountered when attempting to create the spark context:


FileNotFoundError: [Errno 2] No such file or directory: '/usr/hdp/current/spark-client/./bin/spark-submit'



FileNotFoundError                         Traceback (most recent call last)
<ipython-input-4-fbb9eeb69493> in <module>
----> 1 spark = SparkSession.builder.master("local").appName("myApp").getOrCreate()

/usr/local/lib/python3.6/site-packages/pyspark/sql/ in getOrCreate(self)
    226                             sparkConf.set(key, value)
    227                         # This SparkContext may be an existing one.
--> 228                         sc = SparkContext.getOrCreate(sparkConf)
    229                     # Do not update `SparkConf` for existing `SparkContext`, as it's shared
    230                     # by all sessions.

/usr/local/lib/python3.6/site-packages/pyspark/ in getOrCreate(cls, conf)
    390         with SparkContext._lock:
    391             if SparkContext._active_spark_context is None:
--> 392                 SparkContext(conf=conf or SparkConf())
    393             return SparkContext._active_spark_context

/usr/local/lib/python3.6/site-packages/pyspark/ in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
    142                 " is not allowed as it is a security risk.")
--> 144         SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
    145         try:
    146             self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,

/usr/local/lib/python3.6/site-packages/pyspark/ in _ensure_initialized(cls, instance, gateway, conf)
    337         with SparkContext._lock:
    338             if not SparkContext._gateway:
--> 339                 SparkContext._gateway = gateway or launch_gateway(conf)
    340                 SparkContext._jvm = SparkContext._gateway.jvm

/usr/local/lib/python3.6/site-packages/pyspark/ in launch_gateway(conf, popen_kwargs)
     96                     signal.signal(signal.SIGINT, signal.SIG_IGN)
     97                 popen_kwargs['preexec_fn'] = preexec_func
---> 98                 proc = Popen(command, **popen_kwargs)
     99             else:
    100                 # preexec_fn not supported on Windows

/usr/lib64/python3.6/ in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors)
    727                                 c2pread, c2pwrite,
    728                                 errread, errwrite,
--> 729                                 restore_signals, start_new_session)
    730         except:
    731             # Cleanup if the child failed starting.

/usr/lib64/python3.6/ in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, start_new_session)
   1362                         if errno_num == errno.ENOENT:
   1363                             err_msg += ': ' + repr(err_filename)
-> 1364                     raise child_exception_type(errno_num, err_msg, err_filename)
   1365                 raise child_exception_type(err_msg)

FileNotFoundError: [Errno 2] No such file or directory: '/usr/hdp/current/spark-client/./bin/spark-submit': '/usr/hdp/current/spark-client/./bin/spark-submit'



I think the problem might be connected to the environment variables, but as a novice I don't know.





export SPARK_HOME=/usr/hdp/current/spark-client
export HADOOP_HOME=/usr/hdp/current/hadoop-client
export HADOOP_CONF_DIR=/usr/hdp/current/hadoop-client/conf
export PYTHONPATH="/usr/hdp/current/spark-client/python:/usr/hdp/current/spark-client/python/lib/"
export PYTHONSTARTUP=/usr/hdp/current/spark-client/python/pyspark/
export PYSPARK_SUBMIT_ARGS="--master yarn-client pyspark-shell"





Is anyone able to point me in the right direction, so that I can create SparkContext?


Many thanks








Expert Contributor

Hi @Boron 


Could you please set the spark-home environment variable like below before creating spark-session.

import os
os.environ['SPARK_HOME'] = '/usr/hdp/current/spark-client'



Cloudera Employee

Hello @Boron 
I believe you are using HDP 3.x. Note that there is no Spark 1.x available in HDP 3. We need to use Spark 2.x. Set the SPARK_HOME to Spark 2.

export SPARK_HOME=/usr/hdp/current/spark2-client