Reply
Explorer
Posts: 20
Registered: ‎12-07-2018
Accepted Solution

pyspark / pyarrow problem

[ Edited ]

We're using cloudera with anaconda parcel on bda production cluster .

I tried to execute pyspark code that imports pyarrow package , then i faced with error below .

Traceback (most recent call last):
 File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera2-1.cdh5.13.3.p0.316101/lib/spark2/python/lib/pyspark.zip/pyspark/sql/utils.py", line 140, in require_minimum_pyarrow_version
 File "/opt/cloudera/parcels/Anaconda-3.6.5_2/lib/python3.6/site-packages/pyarrow/__init__.py", line 47, in <module>
   from pyarrow.lib import cpu_count, set_cpu_count
ImportError: libboost_system.so.1.66.0: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
 File "lbf_livArea_scr_2.py", line 51, in <module>
   @pandas_udf(schema, PandasUDFType.GROUPED_MAP)
 File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera2-1.cdh5.13.3.p0.316101/lib/spark2/python/lib/pyspark.zip/pyspark/sql/udf.py", line 45, in _create_udf
 File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera2-1.cdh5.13.3.p0.316101/lib/spark2/python/lib/pyspark.zip/pyspark/sql/utils.py", line 143, in require_minimum_pyarrow_version
ImportError: PyArrow >= 0.8.0 must be installed; however, it was not found.

Also output of conda list is below .

[ihsany@gbbdap02 ~]$ dzdo /opt/cloudera/parcels/Anaconda/bin/conda list |grep arrow
arrow-cpp                 0.9.0            py36h1ae9da6_7    <unknown>
pyarrow                   0.9.0                    py36_1    <unknown>
[ihsany@gbbdap02 ~]$ dzdo /opt/cloudera/parcels/Anaconda/bin/conda list |grep boost
libboost                  1.65.1               habcd387_4    <unknown>

Cloudera Employee
Posts: 12
Registered: ‎11-21-2018

Re: pyspark / pyarrow problem

From the details which you shared, we could see that pyspark is pointing to older version(libboost_system.so.1.65.1) of libboost than the one expected (libboost_system.so.1.66.0) {{ dzdo /opt/cloudera/parcels/Anaconda/bin/conda list |grep boost libboost 1.65.1 habcd387_4 }} It looks like that new version of PyArrow was not installed properly. So please try clean older packages and then install pyarrow again using below command {{ conda install -c conda-forge pyarrow }} Best Regards, Senthil Kumar

Explorer
Posts: 20
Registered: ‎12-07-2018

Re: pyspark / pyarrow problem

Thanks Gentleman, Let me re-install. I hope it works