Support Questions

Find answers, ask questions, and share your expertise

How to make Imapla Variance UDAF to return Double value instead of String

avatar
New Contributor

Hi,

 

I have been working on creating custom User Defined Aggregate Functions (UDAF), looking at the example provided here.
Here Variance and standard deviation is calculated on a Impala column.

Since we have quite a lot varibles to calculate like sum, sum of squares and count, we use a C++ struct and serialize it as a string, so that data is passed to init, update, merge and finalize phases. 

My questions:

1) Can we have a Double return type instead of String.

2) Where can we find the implementations of Impala buitl-in functions like min(), sum(), max() since these functions return double.

Any suggestions are welcome. Thanks !!!

1 ACCEPTED SOLUTION

avatar

1) The Knuth variance is an Impala built-in. Internally, Impala can handle aggregate functions with different intermediate and output types.

 

Basically, the only reason you are not allowed to create UDAs with different intermediate/output types is because we have not enabled the feature in sermantic analysis.

For us, enabling the feature is the easy part. Adding extensive testing is the hard part.

 

If you are curious, the check for preventing you from creating such UDAs is in:

./fe/src/main/java/com/cloudera/impala/analysis/CreateUdaStmt.java

lines 137 following

 

2) Like I said, enabling the feature is not hard, but does involve a non-trivial QA effort, so I cannot promise a concrete release at this point. I'd recommend keeping an eye on that JIRA for updates to the target version.

 

 

View solution in original post

3 REPLIES 3

avatar

I'm afraid you may have to wait until we resolve:

https://issues.cloudera.org/browse/IMPALA-1829

 

For the impala builtins you can have a look at:

IMPALA_HOME/be/src/exprs/aggregate-functions.h

IMPALA_HOME/be/src/exprs/aggregate-functions.cc

 

 

avatar
New Contributor

Hi Alex,

 

Thanks for your fast response.

 

I have couple of questions more 😛

 

1) I see that the KnuthVariance returns Double, but when I try it in my code having Finalize function return a Double I get,

Analysis Exception: Could not find function func_nameUpdate(double,double,double) returns double in 'HDFS_so_filepath' Check that function name, agruments and return types are correct.

 

I am curious how the in-built functions have that feature.

Pls do let me know if I am missing something.

 

2) If cloudera needs to fix it, pls let me know on which CDH and impala version, fix might be released. Thanks !!!

avatar

1) The Knuth variance is an Impala built-in. Internally, Impala can handle aggregate functions with different intermediate and output types.

 

Basically, the only reason you are not allowed to create UDAs with different intermediate/output types is because we have not enabled the feature in sermantic analysis.

For us, enabling the feature is the easy part. Adding extensive testing is the hard part.

 

If you are curious, the check for preventing you from creating such UDAs is in:

./fe/src/main/java/com/cloudera/impala/analysis/CreateUdaStmt.java

lines 137 following

 

2) Like I said, enabling the feature is not hard, but does involve a non-trivial QA effort, so I cannot promise a concrete release at this point. I'd recommend keeping an eye on that JIRA for updates to the target version.