Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Error running Pyspark Interpreter after Installing Miniconda

Highlighted

Error running Pyspark Interpreter after Installing Miniconda

New Contributor

40800-pandas-error.png

Hi, I am having trouble when running Pyspark Interpreter. I edited zeppelin.pyspark.python variable with /usr/lib/miniconda2/bin/python. Besides, I also can't execute pandas.

Below is the error in Zeppelin UI

40799-zeppelin-ui.png

8 REPLIES 8

Re: Error running Pyspark Interpreter after Installing Miniconda

@Ashikin,

Do you have pandas installed ? Try installing pandas and run

conda install pandas

Other useful libraries are matplotlib, numpy if you want to install.

Thanks,

Aditya

Re: Error running Pyspark Interpreter after Installing Miniconda

New Contributor

Hi Aditya, I had installed pandas. I can execute pandas at PySpark CLI tho but not in Zeppelin.

There are errors stated when I run pyspark command:

error: [Errno 111] Connection refused
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java se              rver
Traceback (most recent call last):
  File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.              py", line 690, in start
    self.socket.connect((self.address, self.port))
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 111] Connection refused
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java se              rver
Traceback (most recent call last):
  File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.              py", line 690, in start
    self.socket.connect((self.address, self.port))
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 111] Connection refused
Traceback (most recent call last):
  File "/usr/hdp/2.5.3.0-37/spark/python/pyspark/shell.py", line 43, in <module>
    sc = SparkContext(pyFiles=add_files)
  File "/usr/hdp/2.5.3.0-37/spark/python/pyspark/context.py", line 115, in __ini              t__
    conf, jsc, profiler_cls)
  File "/usr/hdp/2.5.3.0-37/spark/python/pyspark/context.py", line 172, in _do_i              nit
    self._jsc = jsc or self._initialize_context(self._conf._jconf)
  File "/usr/hdp/2.5.3.0-37/spark/python/pyspark/context.py", line 235, in _init              ialize_context
    return self._jvm.JavaSparkContext(jconf)
  File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.              py", line 1062, in __call__
  File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.              py", line 631, in send_command
  File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.              py", line 624, in send_command
  File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.              py", line 579, in _get_connection
  File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.              py", line 585, in _create_connection
  File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.              py", line 697, in start
py4j.protocol.Py4JNetworkError: An error occurred while trying to connect to the               Java server
>>> import pandas as pd
>>>


Re: Error running Pyspark Interpreter after Installing Miniconda

@Ashikin,

Try setting PYSPARK_DRIVER_PYTHON environment variable so that Spark uses Anaconda/Miniconda.

From the logs looks like spark is using pyspark which is bundled

Thanks,

Aditya

Re: Error running Pyspark Interpreter after Installing Miniconda

New Contributor

Should I declare the variable like this?

export PYSPARK_DRIVER_PYTHON=miniconda2

Re: Error running Pyspark Interpreter after Installing Miniconda

@Ashikin,

Try setting the below instead of PYSPARK_DRIVER_PYTHON

export PYSPARK_PYTHON=<anaconda python path>

ex: export PYSPARK_PYTHON=/home/ambari/anaconda3/bin/python

Re: Error running Pyspark Interpreter after Installing Miniconda

@Ashikin,

Try setting PYSPARK_DRIVER environment variable so that Spark uses Anaconda/Miniconda.

From the logs looks like spark is using pyspark which is bundled. Check the link for more info

https://spark.apache.org/docs/1.6.2/programming-guide.html#linking-with-spark

Thanks,

Aditya

Re: Error running Pyspark Interpreter after Installing Miniconda

@Ashikin,

Try setting the below instead of PYSPARK_DRIVER_PYTHON

export PYSPARK_PYTHON=<anaconda python path>

ex: export PYSPARK_PYTHON=/home/ambari/anaconda3/bin/python

Re: Error running Pyspark Interpreter after Installing Miniconda

New Contributor

Aditya, I got this error at Zeppelin UI

39792-pyspark-connection-refused.png

Don't have an account?
Coming from Hortonworks? Activate your account here