Support Questions

Find answers, ask questions, and share your expertise
Announcements
Welcome to the upgraded Community! Read this blog to see What’s New!

Error running Pyspark Interpreter after Installing Miniconda

avatar
Explorer

40800-pandas-error.png

Hi, I am having trouble when running Pyspark Interpreter. I edited zeppelin.pyspark.python variable with /usr/lib/miniconda2/bin/python. Besides, I also can't execute pandas.

Below is the error in Zeppelin UI

40799-zeppelin-ui.png

8 REPLIES 8

avatar

@Ashikin,

Do you have pandas installed ? Try installing pandas and run

conda install pandas

Other useful libraries are matplotlib, numpy if you want to install.

Thanks,

Aditya

avatar
Explorer

Hi Aditya, I had installed pandas. I can execute pandas at PySpark CLI tho but not in Zeppelin.

There are errors stated when I run pyspark command:

error: [Errno 111] Connection refused
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java se              rver
Traceback (most recent call last):
  File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.              py", line 690, in start
    self.socket.connect((self.address, self.port))
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 111] Connection refused
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java se              rver
Traceback (most recent call last):
  File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.              py", line 690, in start
    self.socket.connect((self.address, self.port))
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 111] Connection refused
Traceback (most recent call last):
  File "/usr/hdp/2.5.3.0-37/spark/python/pyspark/shell.py", line 43, in <module>
    sc = SparkContext(pyFiles=add_files)
  File "/usr/hdp/2.5.3.0-37/spark/python/pyspark/context.py", line 115, in __ini              t__
    conf, jsc, profiler_cls)
  File "/usr/hdp/2.5.3.0-37/spark/python/pyspark/context.py", line 172, in _do_i              nit
    self._jsc = jsc or self._initialize_context(self._conf._jconf)
  File "/usr/hdp/2.5.3.0-37/spark/python/pyspark/context.py", line 235, in _init              ialize_context
    return self._jvm.JavaSparkContext(jconf)
  File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.              py", line 1062, in __call__
  File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.              py", line 631, in send_command
  File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.              py", line 624, in send_command
  File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.              py", line 579, in _get_connection
  File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.              py", line 585, in _create_connection
  File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.              py", line 697, in start
py4j.protocol.Py4JNetworkError: An error occurred while trying to connect to the               Java server
>>> import pandas as pd
>>>


avatar

@Ashikin,

Try setting PYSPARK_DRIVER_PYTHON environment variable so that Spark uses Anaconda/Miniconda.

From the logs looks like spark is using pyspark which is bundled

Thanks,

Aditya

avatar
Explorer

Should I declare the variable like this?

export PYSPARK_DRIVER_PYTHON=miniconda2

avatar

@Ashikin,

Try setting the below instead of PYSPARK_DRIVER_PYTHON

export PYSPARK_PYTHON=<anaconda python path>

ex: export PYSPARK_PYTHON=/home/ambari/anaconda3/bin/python

avatar

@Ashikin,

Try setting PYSPARK_DRIVER environment variable so that Spark uses Anaconda/Miniconda.

From the logs looks like spark is using pyspark which is bundled. Check the link for more info

https://spark.apache.org/docs/1.6.2/programming-guide.html#linking-with-spark

Thanks,

Aditya

avatar

@Ashikin,

Try setting the below instead of PYSPARK_DRIVER_PYTHON

export PYSPARK_PYTHON=<anaconda python path>

ex: export PYSPARK_PYTHON=/home/ambari/anaconda3/bin/python

avatar
Explorer

Aditya, I got this error at Zeppelin UI

39792-pyspark-connection-refused.png

Labels