Support Questions

AnandG · ‎08-13-2020

I am trying to use pandas udfs in my code. Internally it uses apache arrow for the data conversion. I am getting below issue with the pyarrow module despite of me importing it in my app code explicitly.

File "/opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p3996.4056429/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 361, in main

func, profiler, deserializer, serializer = read_udfs(pickleSer, infile, eval_type)

File "/opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p3996.4056429/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 236, in read_udfs

arg_offsets, udf = read_single_udf(pickleSer, infile, eval_type, runner_conf)

File "/opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p3996.4056429/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 175, in read_single_udf

return arg_offsets, wrap_scalar_pandas_udf(func, return_type)

File "/opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p3996.4056429/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 84, in wrap_scalar_pandas_udf

arrow_return_type = to_arrow_type(return_type)

File "/opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p3996.4056429/lib/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 1585, in to_arrow_type

import pyarrow as pa

ModuleNotFoundError: No module named 'pyarrow'

I also tried to manually enable arrow but still no luck

spark.conf.set("spark.sql.execution.arrow.enabled", "true")

GangWar · ‎08-13-2020

@AnandG Is pyarrow module is installed? Please try to install with:

#pip install pyarrow 
(pip3 for Python 3, you may need to upgrade the pip utility as well)

Cheers!
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Cloudera Community

Support Questions

Pandas UDFs in Pyspark ; ModuleNotFoundError: No module named 'pyarrow'