Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Error running Pyspark Interpreter after Installing Miniconda

avatar
Contributor

40800-pandas-error.png

Hi, I am having trouble when running Pyspark Interpreter. I edited zeppelin.pyspark.python variable with /usr/lib/miniconda2/bin/python. Besides, I also can't execute pandas.

Below is the error in Zeppelin UI

40799-zeppelin-ui.png

8 REPLIES 8

avatar
Super Guru

@Ashikin,

Do you have pandas installed ? Try installing pandas and run

conda install pandas

Other useful libraries are matplotlib, numpy if you want to install.

Thanks,

Aditya

avatar
Contributor

Hi Aditya, I had installed pandas. I can execute pandas at PySpark CLI tho but not in Zeppelin.

There are errors stated when I run pyspark command:

error: [Errno 111] Connection refused
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java se              rver
Traceback (most recent call last):
  File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.              py", line 690, in start
    self.socket.connect((self.address, self.port))
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 111] Connection refused
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java se              rver
Traceback (most recent call last):
  File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.              py", line 690, in start
    self.socket.connect((self.address, self.port))
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 111] Connection refused
Traceback (most recent call last):
  File "/usr/hdp/2.5.3.0-37/spark/python/pyspark/shell.py", line 43, in <module>
    sc = SparkContext(pyFiles=add_files)
  File "/usr/hdp/2.5.3.0-37/spark/python/pyspark/context.py", line 115, in __ini              t__
    conf, jsc, profiler_cls)
  File "/usr/hdp/2.5.3.0-37/spark/python/pyspark/context.py", line 172, in _do_i              nit
    self._jsc = jsc or self._initialize_context(self._conf._jconf)
  File "/usr/hdp/2.5.3.0-37/spark/python/pyspark/context.py", line 235, in _init              ialize_context
    return self._jvm.JavaSparkContext(jconf)
  File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.              py", line 1062, in __call__
  File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.              py", line 631, in send_command
  File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.              py", line 624, in send_command
  File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.              py", line 579, in _get_connection
  File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.              py", line 585, in _create_connection
  File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.              py", line 697, in start
py4j.protocol.Py4JNetworkError: An error occurred while trying to connect to the               Java server
>>> import pandas as pd
>>>


avatar
Super Guru

@Ashikin,

Try setting PYSPARK_DRIVER_PYTHON environment variable so that Spark uses Anaconda/Miniconda.

From the logs looks like spark is using pyspark which is bundled

Thanks,

Aditya

avatar
Contributor

Should I declare the variable like this?

export PYSPARK_DRIVER_PYTHON=miniconda2

avatar
Super Guru

@Ashikin,

Try setting the below instead of PYSPARK_DRIVER_PYTHON

export PYSPARK_PYTHON=<anaconda python path>

ex: export PYSPARK_PYTHON=/home/ambari/anaconda3/bin/python

avatar
Super Guru

@Ashikin,

Try setting PYSPARK_DRIVER environment variable so that Spark uses Anaconda/Miniconda.

From the logs looks like spark is using pyspark which is bundled. Check the link for more info

https://spark.apache.org/docs/1.6.2/programming-guide.html#linking-with-spark

Thanks,

Aditya

avatar
Super Guru

@Ashikin,

Try setting the below instead of PYSPARK_DRIVER_PYTHON

export PYSPARK_PYTHON=<anaconda python path>

ex: export PYSPARK_PYTHON=/home/ambari/anaconda3/bin/python

avatar
Contributor

Aditya, I got this error at Zeppelin UI

39792-pyspark-connection-refused.png