Created on 12-08-2017 04:53 AM - edited 09-16-2022 05:37 AM
Hello,
We have multiples Impala C++ UDFs that we want to deploy on our production Cloudera cluster.
We have carefully rewieved the source code, in order to avoid memory leaks, segmentation fault and race conditions.
However, if we have not seen something and a segmentation fault, memory leaks, or race conditions still occurs, what could be the risks for the entire cluster ?
If an error like that occurs, does the corresponding Impalad could crash ? If an Impalad crash owing to an UDF, restarting it will be enough to go back to good health ?
What about the Impalad isolation ? Again if a segmentation fault, memory leaks or race conditions occurs, does other cloudera services instances can be affected ? (HDFS, Hive, ...)
Could you please quickly summarize the associated risks with buggy C++ UDF ?
Thanks !
Created 12-08-2017 12:03 PM
Hi @Plop564
> If an error like that occurs, does the corresponding Impalad could crash ? If an Impalad crash owing to an UDF, restarting it will be enough to go back to good health ?
Yes and yes.
>What about the Impalad isolation ? Again if a segmentation fault, memory leaks or race conditions occurs, does other cloudera services instances can be affected ? (HDFS, Hive, ...)
It won't affect services outside of Impala - the crash is isolated to the Impala process
> Could you please quickly summarize the associated risks with buggy C++ UDF ?
You already mentioned the possibilities of crashes, memory leaks and memory corruption. The other thing to keep in mind is that the UDF runs within the Impala process so essentially has the same permissions as the "impala" user. A malicious UDF could exploit this. This is why we recommend reviewing UDF code. It sounds like you're already following that best practice.
Created 12-08-2017 12:03 PM
Hi @Plop564
> If an error like that occurs, does the corresponding Impalad could crash ? If an Impalad crash owing to an UDF, restarting it will be enough to go back to good health ?
Yes and yes.
>What about the Impalad isolation ? Again if a segmentation fault, memory leaks or race conditions occurs, does other cloudera services instances can be affected ? (HDFS, Hive, ...)
It won't affect services outside of Impala - the crash is isolated to the Impala process
> Could you please quickly summarize the associated risks with buggy C++ UDF ?
You already mentioned the possibilities of crashes, memory leaks and memory corruption. The other thing to keep in mind is that the UDF runs within the Impala process so essentially has the same permissions as the "impala" user. A malicious UDF could exploit this. This is why we recommend reviewing UDF code. It sounds like you're already following that best practice.
Created on 12-10-2017 11:14 AM - edited 12-10-2017 11:15 AM
Hi @Tim Armstrong
Thank you very much for the reply !