Created on 04-08-2019 07:11 AM - edited 09-16-2022 07:17 AM
We're using cloudera with anaconda parcel on bda production cluster .
I tried to execute pyspark code that imports pyarrow package , then i faced with error below .
Traceback (most recent call last):
File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera2-1.cdh5.13.3.p0.316101/lib/spark2/python/lib/pyspark.zip/pyspark/sql/utils.py", line 140, in require_minimum_pyarrow_version
File "/opt/cloudera/parcels/Anaconda-3.6.5_2/lib/python3.6/site-packages/pyarrow/__init__.py", line 47, in <module>
from pyarrow.lib import cpu_count, set_cpu_count
ImportError: libboost_system.so.1.66.0: cannot open shared object file: No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "lbf_livArea_scr_2.py", line 51, in <module>
@pandas_udf(schema, PandasUDFType.GROUPED_MAP)
File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera2-1.cdh5.13.3.p0.316101/lib/spark2/python/lib/pyspark.zip/pyspark/sql/udf.py", line 45, in _create_udf
File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera2-1.cdh5.13.3.p0.316101/lib/spark2/python/lib/pyspark.zip/pyspark/sql/utils.py", line 143, in require_minimum_pyarrow_version
ImportError: PyArrow >= 0.8.0 must be installed; however, it was not found.
Also output of conda list is below .
[ihsany@gbbdap02 ~]$ dzdo /opt/cloudera/parcels/Anaconda/bin/conda list |grep arrow
arrow-cpp 0.9.0 py36h1ae9da6_7 <unknown>
pyarrow 0.9.0 py36_1 <unknown>
[ihsany@gbbdap02 ~]$ dzdo /opt/cloudera/parcels/Anaconda/bin/conda list |grep boost
libboost 1.65.1 habcd387_4 <unknown>
Created 04-08-2019 08:28 PM
From the details which you shared, we could see that pyspark is pointing to older version(libboost_system.so.1.65.1) of libboost than the one expected (libboost_system.so.1.66.0) {{ dzdo /opt/cloudera/parcels/Anaconda/bin/conda list |grep boost libboost 1.65.1 habcd387_4 }} It looks like that new version of PyArrow was not installed properly. So please try clean older packages and then install pyarrow again using below command {{ conda install -c conda-forge pyarrow }} Best Regards, Senthil Kumar
Created 04-08-2019 08:28 PM
From the details which you shared, we could see that pyspark is pointing to older version(libboost_system.so.1.65.1) of libboost than the one expected (libboost_system.so.1.66.0) {{ dzdo /opt/cloudera/parcels/Anaconda/bin/conda list |grep boost libboost 1.65.1 habcd387_4 }} It looks like that new version of PyArrow was not installed properly. So please try clean older packages and then install pyarrow again using below command {{ conda install -c conda-forge pyarrow }} Best Regards, Senthil Kumar
Created 04-09-2019 04:20 AM