Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Impala UDF C++ : risks to deploy in a production environnement

avatar
Explorer

Hello,

 

We have multiples Impala C++ UDFs that we want to deploy on our production Cloudera cluster.

 

We have carefully rewieved the source code, in order to avoid memory leaks, segmentation fault and race conditions.

 

However, if we have not seen something and a segmentation fault, memory leaks, or race conditions still occurs, what could be the risks for the entire cluster ?

 

If an error like that occurs, does the corresponding Impalad could crash ? If an Impalad crash owing to an UDF, restarting it will be enough to go back to good health ?

 

What about the Impalad isolation ? Again if a segmentation fault, memory leaks or race conditions occurs, does other cloudera services instances can be affected ? (HDFS, Hive, ...)

 

Could you please quickly summarize the associated risks with buggy C++ UDF ?

 

Thanks !

1 ACCEPTED SOLUTION

avatar

Hi @Plop564

 

> If an error like that occurs, does the corresponding Impalad could crash ? If an Impalad crash owing to an UDF, restarting it will be enough to go back to good health ?

Yes and yes.

 

>What about the Impalad isolation ? Again if a segmentation fault, memory leaks or race conditions occurs, does other cloudera services instances can be affected ? (HDFS, Hive, ...)

It won't affect services outside of Impala - the crash is isolated to the Impala process

 

> Could you please quickly summarize the associated risks with buggy C++ UDF ?

You already mentioned the possibilities of crashes, memory leaks and memory corruption. The other thing to keep in mind is that the UDF runs within the Impala process so essentially has the same permissions as the "impala" user. A malicious UDF could exploit this. This is why we recommend reviewing UDF code. It sounds like you're already following that best practice.

View solution in original post

2 REPLIES 2

avatar

Hi @Plop564

 

> If an error like that occurs, does the corresponding Impalad could crash ? If an Impalad crash owing to an UDF, restarting it will be enough to go back to good health ?

Yes and yes.

 

>What about the Impalad isolation ? Again if a segmentation fault, memory leaks or race conditions occurs, does other cloudera services instances can be affected ? (HDFS, Hive, ...)

It won't affect services outside of Impala - the crash is isolated to the Impala process

 

> Could you please quickly summarize the associated risks with buggy C++ UDF ?

You already mentioned the possibilities of crashes, memory leaks and memory corruption. The other thing to keep in mind is that the UDF runs within the Impala process so essentially has the same permissions as the "impala" user. A malicious UDF could exploit this. This is why we recommend reviewing UDF code. It sounds like you're already following that best practice.

avatar
Explorer

Hi @Tim Armstrong

Thank you very much for the reply !