The Impala user's guide states "UDFs are parallelized using multiple threads". Does this mean that the UDF's functions must be thread safe? Or will only one thread be calling the UDF's functions at any given time?
I'm writing a UDF aggregate function that maintains state across function calls so I need to know how to write it to work with Impala.
Any insight into how UDF's behave in a multi-threaded environment is greatly appreciated.
I'm using version 5.8 of the Impala UDF development kit to avoid the linking problem with std c++11 and noexcept (which is still present in version 5.10).
You should assume that the functions can be called from multiple threads concurrently.
The UDF interface (see the udf.h header installed by the impala-udf-dev package) provides GetFunctionState() and SetFunctionState() methods that. If you use those in THREAD_LOCAL mode, you can save state per-thread.