Created 07-05-2016 09:58 AM
Hello,
I work with cloudera VM 5.4.2
I executed PYSPARK with the command
PYSPARK_DRIVER_PYTHON=ipython pyspark
After I try to import pandas
import pandas as pd
I get the following error
Using Python version 2.6.6 (r266:84292, Feb 22 2013 00:00:18)
SparkContext available as sc, HiveContext available as sqlContext.
In [1]: import pandas as pd
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-1-af55e7023913> in <module>()
----> 1 import pandas as pd
/usr/lib/python2.6/site-packages/pandas-0.18.0-py2.6-linux-x86_64.egg/pandas/__init__.py in <module>()
20
21 # numpy compat
---> 22 from pandas.compat.numpy_compat import *
23
24 try:
/usr/lib/python2.6/site-packages/pandas-0.18.0-py2.6-linux-x86_64.egg/pandas/compat/__init__.py in <module>()
296 return wrapper
297
--> 298 from collections import OrderedDict, Counter
299
300 if PY3:
ImportError: cannot import name OrderedDict
In [2]:
Why can't I import Pandas?
Thanks in advance
Carlota Vina
Created 07-05-2016 10:00 AM
The simplest explanation is that pandas isn't installed, of course. It's not part of Python. Consider using the Anaconda parcel to lay down a Python distribution for use with Pyspark that contains many commonly-used packages like pandas.
Created 07-05-2016 10:05 AM
Thanks for reply.
I executed anaconda3
sudo yum install -y spark-core spark-master spark-worker spark-history-server spark-python
wget http://repo.continuum.io/archive/Anaconda3-4.0.0-Linux-x86_64.sh
bash Anaconda3-4.0.0-Linux-x86_64.sh
But I can't import pandas still
Thanks in advance
Carlota Vina
Created 07-05-2016 11:09 AM
Hello,
When I installed anaconda3 I have pandas.0.18.0 and python is 3.5
But when I executed PYSPARK the version of python is 2.6.6
PYSPARK_DRIVER_PYTHON=ipython pyspark
Python 2.6.6 (r266:84292, Feb 22 2013, 00:00:18)
Type "copyright", "credits" or "license" for more information.
IPython 1.2.1 -- An enhanced Interactive Python.
Could this be the error?
Thanks in advance
Carlota Vina
Created 07-05-2016 01:45 PM
Installing Anaconda doesn't make Pyspark use it. You would have to tell Pyspark to do so. I was referring to the Anaconda parcel for CDH, which does the setup, not the generic Anaconda distribution.
Created 07-05-2016 08:59 PM
Hello,
I have a .py and I want to execute instruction by instruction. Could you explain me how to do this?
Thanks in advance
Carlota Vina
Created 07-06-2016 08:13 AM
I would advise to use ipython's internal debugger ipdb. This debugger allows you to run every statement step by step.
* http://quant-econ.net/py/ipython.html#debugging
* https://docs.python.org/3/library/pdb.html
Finally regarding the other statements above when you using Anaconda's ipython remember to set the environment variable PYSPARK_PYTHON to the location of ipython (ex. /usr/bin/ipython) so PySpark knows where to find ipython.
Good luck.