Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Impala UDF C++ : risks to deploy in a production environnement

avatar
Explorer

Hello,

 

We have multiples Impala C++ UDFs that we want to deploy on our production Cloudera cluster.

 

We have carefully rewieved the source code, in order to avoid memory leaks, segmentation fault and race conditions.

 

However, if we have not seen something and a segmentation fault, memory leaks, or race conditions still occurs, what could be the risks for the entire cluster ?

 

If an error like that occurs, does the corresponding Impalad could crash ? If an Impalad crash owing to an UDF, restarting it will be enough to go back to good health ?

 

What about the Impalad isolation ? Again if a segmentation fault, memory leaks or race conditions occurs, does other cloudera services instances can be affected ? (HDFS, Hive, ...)

 

Could you please quickly summarize the associated risks with buggy C++ UDF ?

 

Thanks !

1 ACCEPTED SOLUTION

avatar

Hi @Plop564

 

> If an error like that occurs, does the corresponding Impalad could crash ? If an Impalad crash owing to an UDF, restarting it will be enough to go back to good health ?

Yes and yes.

 

>What about the Impalad isolation ? Again if a segmentation fault, memory leaks or race conditions occurs, does other cloudera services instances can be affected ? (HDFS, Hive, ...)

It won't affect services outside of Impala - the crash is isolated to the Impala process

 

> Could you please quickly summarize the associated risks with buggy C++ UDF ?

You already mentioned the possibilities of crashes, memory leaks and memory corruption. The other thing to keep in mind is that the UDF runs within the Impala process so essentially has the same permissions as the "impala" user. A malicious UDF could exploit this. This is why we recommend reviewing UDF code. It sounds like you're already following that best practice.

View solution in original post

2 REPLIES 2

avatar

Hi @Plop564

 

> If an error like that occurs, does the corresponding Impalad could crash ? If an Impalad crash owing to an UDF, restarting it will be enough to go back to good health ?

Yes and yes.

 

>What about the Impalad isolation ? Again if a segmentation fault, memory leaks or race conditions occurs, does other cloudera services instances can be affected ? (HDFS, Hive, ...)

It won't affect services outside of Impala - the crash is isolated to the Impala process

 

> Could you please quickly summarize the associated risks with buggy C++ UDF ?

You already mentioned the possibilities of crashes, memory leaks and memory corruption. The other thing to keep in mind is that the UDF runs within the Impala process so essentially has the same permissions as the "impala" user. A malicious UDF could exploit this. This is why we recommend reviewing UDF code. It sounds like you're already following that best practice.

avatar
Explorer

Hi @Tim Armstrong

Thank you very much for the reply !