Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to make Imapla Variance UDAF to return Double value instead of String

Solved Go to solution

How to make Imapla Variance UDAF to return Double value instead of String

New Contributor

Hi,

 

I have been working on creating custom User Defined Aggregate Functions (UDAF), looking at the example provided here.
Here Variance and standard deviation is calculated on a Impala column.

Since we have quite a lot varibles to calculate like sum, sum of squares and count, we use a C++ struct and serialize it as a string, so that data is passed to init, update, merge and finalize phases. 

My questions:

1) Can we have a Double return type instead of String.

2) Where can we find the implementations of Impala buitl-in functions like min(), sum(), max() since these functions return double.

Any suggestions are welcome. Thanks !!!

1 ACCEPTED SOLUTION

Accepted Solutions

Re: How to make Imapla Variance UDAF to return Double value instead of String

Master Collaborator

1) The Knuth variance is an Impala built-in. Internally, Impala can handle aggregate functions with different intermediate and output types.

 

Basically, the only reason you are not allowed to create UDAs with different intermediate/output types is because we have not enabled the feature in sermantic analysis.

For us, enabling the feature is the easy part. Adding extensive testing is the hard part.

 

If you are curious, the check for preventing you from creating such UDAs is in:

./fe/src/main/java/com/cloudera/impala/analysis/CreateUdaStmt.java

lines 137 following

 

2) Like I said, enabling the feature is not hard, but does involve a non-trivial QA effort, so I cannot promise a concrete release at this point. I'd recommend keeping an eye on that JIRA for updates to the target version.

 

 

3 REPLIES 3

Re: How to make Imapla Variance UDAF to return Double value instead of String

Master Collaborator

I'm afraid you may have to wait until we resolve:

https://issues.cloudera.org/browse/IMPALA-1829

 

For the impala builtins you can have a look at:

IMPALA_HOME/be/src/exprs/aggregate-functions.h

IMPALA_HOME/be/src/exprs/aggregate-functions.cc

 

 

Re: How to make Imapla Variance UDAF to return Double value instead of String

New Contributor

Hi Alex,

 

Thanks for your fast response.

 

I have couple of questions more :P

 

1) I see that the KnuthVariance returns Double, but when I try it in my code having Finalize function return a Double I get,

Analysis Exception: Could not find function func_nameUpdate(double,double,double) returns double in 'HDFS_so_filepath' Check that function name, agruments and return types are correct.

 

I am curious how the in-built functions have that feature.

Pls do let me know if I am missing something.

 

2) If cloudera needs to fix it, pls let me know on which CDH and impala version, fix might be released. Thanks !!!

Re: How to make Imapla Variance UDAF to return Double value instead of String

Master Collaborator

1) The Knuth variance is an Impala built-in. Internally, Impala can handle aggregate functions with different intermediate and output types.

 

Basically, the only reason you are not allowed to create UDAs with different intermediate/output types is because we have not enabled the feature in sermantic analysis.

For us, enabling the feature is the easy part. Adding extensive testing is the hard part.

 

If you are curious, the check for preventing you from creating such UDAs is in:

./fe/src/main/java/com/cloudera/impala/analysis/CreateUdaStmt.java

lines 137 following

 

2) Like I said, enabling the feature is not hard, but does involve a non-trivial QA effort, so I cannot promise a concrete release at this point. I'd recommend keeping an eye on that JIRA for updates to the target version.