Created on 12-02-2017 11:18 AM - edited 09-16-2022 05:35 AM
I have the following UDF :
CREATE FUNCTION myudf(string)
RETURNS string
LOCATION '/user/cloudera/myudflib.so'
SYMBOL='Process'
PREPARE_FN='PrepareLibrariesAndDataStructures'
CLOSE_FN='CloseLibrariesAndCleanupDataStructures';
As you can see, my C++ UDF need for each Impala thread to initialize some libraries and data structures with the PrepareLibrariesAndDataStructures function BEFORE the Process function start to be called multiples times.
On the other hand, CloseLibrariesAndCleanupDataStructures need to always be called when the corresponding Impala thread has no other Process function to call, in order to freeup data structure and cleanup libraries.
In order to avoid memory leaks, does Cloudera Impala guarantee that when, either the user cancel the query, or either the Process function fails with setError(), the CLOSE_FN will be still called ?
In other words, can we trust Cloudera Impala, to always call CLOSE_FN when a corresponding PREPARE_FN is called ? Or must we put the data_structures/library initialization/cleanup directly in the SYMBOL Process function to minimize the memory leaks risks ?
Thank you very much !