Unfortunately there isn't a way to make the UDFs permanent.
A couple of options are:
1) Avoid restarting the catalog. Just restarting the impalads shouldn't cause the UDFs to be forgotten.
2) Write a script to restart the service using the CM API then add your custom UDF loading after the restart.
BTW, what causes the need to restart the service? Maybe something can be done to avoid that.
The restart for Impala services are done if we make any changes to configuration. Is it necessary to restart Impala after any configuration changes ?
Can you please provide me with a command to call that api and start the impala catlogd server so that I can incorporate that in my shell script. I am not understanding how it works or if you have any sample example where anyone have a script that stops and starts catalogd server and then create the UDF's.
Help is really apprceiated!!!!
Here is a link to an example http://cloudera.github.io/cm_api/docs/python-client/#service-lifecycle-and-commands . That is the python client but there is also a java client or you can use curl from the command line if you like (there are examples for each else where on the site).
About the need to restart the catalog for config changes, it all depends on which configs are changing. For example if the connection to the Hive metastore has changed, then the catalog would need a restart. If some YARN container configs changed, those won't affect the catalog.
Thanks for the reply and your solutions. I am looking into CM API.
The issue is that sometimes are server goes down and comes back in like a minute, which auto restarts all the cloudera services, but we loose the Impala UDF's. So the issue is that we do not know when the server will go down, which automatically gets down impala. Hence, for this issue we have to store Impala UDF's permanently.
Hence was looking into following:
Noticed that whenever Impala starts the cloudera-scm-agent creates process under the following path:
What does this indicate and what does these directories store, If i delete them anything will go wrong. As I did and notice that nothing happens and when I restart again it created new directories with different number.
I am planning to write a script which will look into this directory to check whether CATALOGSERVER is present or absent if present then create our UDF's but in such case I will be deleting this directory once IMPALA SERVICE RESTARTED.
Thanks for help !!!!!
Finally Resolved this issue. The solution is little tricky. As in my previous comment it clears that when impala service is restarted it creates a new CATALOGSERVER directory, through which we are indicated that service has been started. Hence, we have written a shell script which is triggered through Cron job to check if new directory has been created, if yes then it will send a notification file which is basically a hadoop touchz command which creates an empty file on HDFS.
Then once the notification file is seen, we created an oozie coordinator job which kicks in to add the UDF's on all the databases which is taken care through shell script.
That is a really "broad definition for "Resolved" hahaha.... quite a hack for a solution that should be supported from the get go in Impala.