Reply
Highlighted
jmb
New Contributor
Posts: 5
Registered: ‎11-29-2016

Impala C++ UDF library initialization

[ Edited ]

Hi,

My C++ library containing my UDF has a function that must be called once when the library is loaded and once when it is unloaded. I thought I would put this into the PREPARE_FN and CLOSE_FN but those are not implemented according to the error message I recieve from the CREATE AGGREGATE FUNCTION statement. So, I've put them into the INIT_FN and FINALIZE_FN. Are these guaranteed to be called only once per process?

 

When will the PREPAERE_FN/CLOSE_FN be implemented?

 

I'm using 5.8 as version 5.9 has a linking problem related to std c++11 and noexcept.

 

Thanks!

Cloudera Employee
Posts: 395
Registered: ‎07-29-2015

Re: Impala C++ UDF library initialization

Init() is called at least once per aggregated value (maybe more if values are computed on different nodes then merged later). Finalize() is called once per output aggregate value. This seems like a case where Prepare() and Close() would be useful in a UDA.

 

I would avoid making any assumptions about when the library is loaded or unloaded since it may be cached (e.g. if multiple queries are using the UDF)

Cloudera Employee
Posts: 395
Registered: ‎07-29-2015

Re: Impala C++ UDF library initialization

I created an issue to track it: https://issues.apache.org/jira/browse/IMPALA-5107

jmb
New Contributor
Posts: 5
Registered: ‎11-29-2016

Re: Impala C++ UDF library initialization

Hi Tim,

Thanks for your reply! My shared library depends on a third-party shared library. In my library, I need to ensure that I call an init() function in the third-party shared library only one time and then call a fini() function only once. Is this possible with an Impala UDF? I've tried using the Linux __attribute__ ((constructor))/destuctor mechanism but then impala hangs when trying to create the function.

 

Announcements