Created on 10-12-2017 02:00 AM - edited 08-17-2019 07:45 PM
Hi, I am having trouble when running Pyspark Interpreter. I edited zeppelin.pyspark.python variable with /usr/lib/miniconda2/bin/python. Besides, I also can't execute pandas.
Below is the error in Zeppelin UI
Created 10-12-2017 05:18 AM
Do you have pandas installed ? Try installing pandas and run
conda install pandas
Other useful libraries are matplotlib, numpy if you want to install.
Thanks,
Aditya
Created 10-13-2017 03:40 AM
Hi Aditya, I had installed pandas. I can execute pandas at PySpark CLI tho but not in Zeppelin.
There are errors stated when I run pyspark command:
error: [Errno 111] Connection refused
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java se rver
Traceback (most recent call last):
File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway. py", line 690, in start
self.socket.connect((self.address, self.port))
File "/usr/lib64/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
error: [Errno 111] Connection refused
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java se rver
Traceback (most recent call last):
File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway. py", line 690, in start
self.socket.connect((self.address, self.port))
File "/usr/lib64/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
error: [Errno 111] Connection refused
Traceback (most recent call last):
File "/usr/hdp/2.5.3.0-37/spark/python/pyspark/shell.py", line 43, in <module>
sc = SparkContext(pyFiles=add_files)
File "/usr/hdp/2.5.3.0-37/spark/python/pyspark/context.py", line 115, in __ini t__
conf, jsc, profiler_cls)
File "/usr/hdp/2.5.3.0-37/spark/python/pyspark/context.py", line 172, in _do_i nit
self._jsc = jsc or self._initialize_context(self._conf._jconf)
File "/usr/hdp/2.5.3.0-37/spark/python/pyspark/context.py", line 235, in _init ialize_context
return self._jvm.JavaSparkContext(jconf)
File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway. py", line 1062, in __call__
File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway. py", line 631, in send_command
File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway. py", line 624, in send_command
File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway. py", line 579, in _get_connection
File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway. py", line 585, in _create_connection
File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway. py", line 697, in start
py4j.protocol.Py4JNetworkError: An error occurred while trying to connect to the Java server
>>> import pandas as pd
>>>
Created 10-13-2017 04:37 AM
Try setting PYSPARK_DRIVER_PYTHON environment variable so that Spark uses Anaconda/Miniconda.
From the logs looks like spark is using pyspark which is bundled
Thanks,
Aditya
Created 10-13-2017 05:39 AM
Should I declare the variable like this?
export PYSPARK_DRIVER_PYTHON=miniconda2
Created 10-13-2017 06:14 AM
Try setting the below instead of PYSPARK_DRIVER_PYTHON
export PYSPARK_PYTHON=<anaconda python path>
ex: export PYSPARK_PYTHON=/home/ambari/anaconda3/bin/python
Created 10-13-2017 04:39 AM
Try setting PYSPARK_DRIVER environment variable so that Spark uses Anaconda/Miniconda.
From the logs looks like spark is using pyspark which is bundled. Check the link for more info
https://spark.apache.org/docs/1.6.2/programming-guide.html#linking-with-spark
Thanks,
Aditya
Created 10-13-2017 06:15 AM
Try setting the below instead of PYSPARK_DRIVER_PYTHON
export PYSPARK_PYTHON=<anaconda python path>
ex: export PYSPARK_PYTHON=/home/ambari/anaconda3/bin/python
Created on 10-13-2017 06:59 AM - edited 08-17-2019 07:45 PM
Aditya, I got this error at Zeppelin UI