Hi All,
When I am trying sample code provided on Cloudera documentation here. It gives me error message saying that Attributes error.
Error:
AttributeError: type object 'PandasUDFType' has no attribute 'GROUPED_AGG'
AttributeError Traceback (most recent call last) in engine ----> 1 import ibis /home/cdsw/.local/lib/python3.6/site-packages/ibis/__init__.py in <module>() 58 with suppress(ImportError): 59 # pip install ibis-framework[spark] ---> 60 import ibis.spark.api as spark # noqa: F401 61 62 with suppress(ImportError): /home/cdsw/.local/lib/python3.6/site-packages/ibis/spark/api.py in <module>() 2 from ibis.spark.client import SparkClient 3 from ibis.spark.compiler import dialect # noqa: F401 ----> 4 from ibis.spark.udf import udf # noqa: F401 5 6 /home/cdsw/.local/lib/python3.6/site-packages/ibis/spark/udf.py in <module>() 123 124 --> 125 class SparkPandasAggregateUDF(SparkPandasUDF): 126 base_class = SparkUDAFNode 127 pandas_udf_type = f.PandasUDFType.GROUPED_AGG /home/cdsw/.local/lib/python3.6/site-packages/ibis/spark/udf.py in SparkPandasAggregateUDF() 125 class SparkPandasAggregateUDF(SparkPandasUDF): 126 base_class = SparkUDAFNode --> 127 pandas_udf_type = f.PandasUDFType.GROUPED_AGG 128 129 AttributeError: type object 'PandasUDFType' has no attribute 'GROUPED_AGG'
Any help to resolve this issue?
Thanks,
CRP
Created 03-23-2021 09:20 AM
This issue has been open for almost 1 year.
Is it possible to connect to Cloudera Impala with Clouder tools ?
How is this done ?
Can anybody connect through CDSW to Impala and run SQL ?
Created 03-24-2021 05:21 AM
@cr @PowerofAI You might have to make sue that Impala packages is installed and then import the UDF something like this may be:
import pandas as pdfrom pyspark.sql.functions import pandas_udf, PandasUDFType from pyspark.sql import Window df = spark.createDataFrame( [(1, 1.0), (1, 2.0), (2, 3.0), (2, 5.0), (2, 10.0)], ("id", "v")) @pandas_udf("double", PandasUDFType.GROUPED_AGG) def pandas_mean(v😞 return v.sum() df.select(pandas_mean(df['v'])).show()df.groupby("id").agg(pandas_mean(df['v'])).show()df.select(pandas_mean(df['v']).over(Window.partitionBy('id'))).show()
Also we have to make sure that
pip install ibis-framework
pip install imapala
is there that might causing the issue.