Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

pyspark / pyarrow problem

SOLVED Go to solution
Highlighted

pyspark / pyarrow problem

Explorer

We're using cloudera with anaconda parcel on bda production cluster .

I tried to execute pyspark code that imports pyarrow package , then i faced with error below .

Traceback (most recent call last):
 File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera2-1.cdh5.13.3.p0.316101/lib/spark2/python/lib/pyspark.zip/pyspark/sql/utils.py", line 140, in require_minimum_pyarrow_version
 File "/opt/cloudera/parcels/Anaconda-3.6.5_2/lib/python3.6/site-packages/pyarrow/__init__.py", line 47, in <module>
   from pyarrow.lib import cpu_count, set_cpu_count
ImportError: libboost_system.so.1.66.0: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
 File "lbf_livArea_scr_2.py", line 51, in <module>
   @pandas_udf(schema, PandasUDFType.GROUPED_MAP)
 File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera2-1.cdh5.13.3.p0.316101/lib/spark2/python/lib/pyspark.zip/pyspark/sql/udf.py", line 45, in _create_udf
 File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera2-1.cdh5.13.3.p0.316101/lib/spark2/python/lib/pyspark.zip/pyspark/sql/utils.py", line 143, in require_minimum_pyarrow_version
ImportError: PyArrow >= 0.8.0 must be installed; however, it was not found.

Also output of conda list is below .

[ihsany@gbbdap02 ~]$ dzdo /opt/cloudera/parcels/Anaconda/bin/conda list |grep arrow
arrow-cpp                 0.9.0            py36h1ae9da6_7    <unknown>
pyarrow                   0.9.0                    py36_1    <unknown>
[ihsany@gbbdap02 ~]$ dzdo /opt/cloudera/parcels/Anaconda/bin/conda list |grep boost
libboost                  1.65.1               habcd387_4    <unknown>

1 ACCEPTED SOLUTION

Accepted Solutions

Re: pyspark / pyarrow problem

Cloudera Employee

From the details which you shared, we could see that pyspark is pointing to older version(libboost_system.so.1.65.1) of libboost than the one expected (libboost_system.so.1.66.0) {{ dzdo /opt/cloudera/parcels/Anaconda/bin/conda list |grep boost libboost 1.65.1 habcd387_4 }} It looks like that new version of PyArrow was not installed properly. So please try clean older packages and then install pyarrow again using below command {{ conda install -c conda-forge pyarrow }} Best Regards, Senthil Kumar

2 REPLIES 2

Re: pyspark / pyarrow problem

Cloudera Employee

From the details which you shared, we could see that pyspark is pointing to older version(libboost_system.so.1.65.1) of libboost than the one expected (libboost_system.so.1.66.0) {{ dzdo /opt/cloudera/parcels/Anaconda/bin/conda list |grep boost libboost 1.65.1 habcd387_4 }} It looks like that new version of PyArrow was not installed properly. So please try clean older packages and then install pyarrow again using below command {{ conda install -c conda-forge pyarrow }} Best Regards, Senthil Kumar

Re: pyspark / pyarrow problem

Explorer
Thanks Gentleman, Let me re-install. I hope it works