Member since
12-26-2018
3
Posts
0
Kudos Received
0
Solutions
03-24-2021
05:21 AM
@cr @PowerofAI You might have to make sue that Impala packages is installed and then import the UDF something like this may be: import pandas as pdfrom pyspark.sql.functions import pandas_udf, PandasUDFType
from pyspark.sql import Window
df = spark.createDataFrame(
[(1, 1.0), (1, 2.0), (2, 3.0), (2, 5.0), (2, 10.0)], ("id", "v"))
@pandas_udf("double", PandasUDFType.GROUPED_AGG)
def pandas_mean(v):
return v.sum()
df.select(pandas_mean(df['v'])).show()df.groupby("id").agg(pandas_mean(df['v'])).show()df.select(pandas_mean(df['v']).over(Window.partitionBy('id'))).show() Also we have to make sure that pip install ibis-framework pip install imapala is there that might causing the issue.
... View more
12-28-2018
05:48 AM
I don't think that it (Cloudera ODBC driver doesn't support insert) is true. By defining table as transcational table, you can insert data. CREATE TABLE insert_test( column1 string, column2 string) clustered by (column1) into 3 buckets stored as orcfile TBLPROPERTIES ('transactional'='true'); insert into table efvci_lnd_edw_dev.insert_test values('1', 'One'); insert into table efvci_lnd_edw_dev.insert_test values('2', 'Two'); Thanks, Chirag Patel
... View more