Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

UDF missing after Impala restart

Re: UDF missing after Impala restart

Contributor

Hi,

 

Do you have any workaround for this issue. Please help !!!!

Re: UDF missing after Impala restart

Cloudera Employee

Unfortunately there isn't a way to make the UDFs permanent. 

 

A couple of options are:

1) Avoid restarting the catalog. Just restarting the impalads shouldn't cause the UDFs to be forgotten.

2) Write a script to restart the service using the CM API then add your custom UDF loading after the restart.

 

BTW, what causes the need to restart the service? Maybe something can be done to avoid that.

Re: UDF missing after Impala restart

Contributor

The restart for Impala services are done if we make any changes to configuration. Is it necessary to restart Impala after any configuration changes ?

Re: UDF missing after Impala restart

Contributor

Hi ,

 

Can you please provide me with a command to call that api and start the impala catlogd server so that I can incorporate that in my shell script. I am not understanding how it works or if you have any sample example where anyone have a script that stops and starts catalogd server and then create the UDF's.

 

Help is really apprceiated!!!!

Re: UDF missing after Impala restart

Cloudera Employee

Here is a link to an example http://cloudera.github.io/cm_api/docs/python-client/#service-lifecycle-and-commands . That is the python client but there is also a java client or you can use curl from the command line if you like (there are examples for each else where on the site).

 

About the need to restart the catalog for config changes, it all depends on which configs are changing. For example if the connection to the Hive metastore has changed, then the catalog would need a restart. If some YARN container configs changed, those won't affect the catalog.

 

 

Re: UDF missing after Impala restart

Contributor

Hi

 

Thanks for the reply and your solutions. I am looking into CM API.

 

The issue is that sometimes are server goes down and comes back in like a minute, which auto restarts all the cloudera services, but we loose the Impala UDF's. So the issue is that we do not know when the server will go down, which automatically gets down impala. Hence, for this issue we have to store Impala UDF's permanently.

 

Hence was looking into following:

 

Noticed that whenever Impala starts the cloudera-scm-agent creates process under the following path:

 

/var/run/cloudera-scm-agent/process

 

70-impala-STATESTORE
71-impala-IMPALAD

72-impala-CATALOGSERVER

 

 

What does this indicate and what does these directories store, If i delete them anything will go wrong. As I did and notice that nothing happens and when I restart again it created new directories with different number.

 

I am planning to write a script which will look into this directory to check whether CATALOGSERVER is present or absent if present then create our UDF's but in such case I will be deleting this directory once IMPALA SERVICE RESTARTED.

 

Thanks for help !!!!!

Re: UDF missing after Impala restart

Master Guru
A process configuration directory such as "72-impala-CATALOGSERVER" is created by the CM agent to store the role's configuration, stderr/stdout, etc. files during its instance lifetime.

You can delete such a directory if a newer one exists (with a higher number). For example, if there is a 194-impala-CATALOGSERVER, then the older directory of 72-impala-CATALOGSERVER can be deleted.

I'd however recommend you work on copies of these, as the older directories may carry required information for troubleshooting or root-causing past events/incidents, and deleting them away removes that ability.

Re: UDF missing after Impala restart

Contributor

Finally Resolved this issue. The solution is little tricky. As in my previous comment it clears that when impala service is restarted it creates a new CATALOGSERVER directory, through which we are indicated that service has been started. Hence, we have written a shell script which is triggered through Cron job to check if new directory has been created, if yes then it will send a notification file which is basically a hadoop touchz command which creates an empty file on HDFS.

Then once the notification file is seen, we created an oozie coordinator job which kicks in to add the UDF's on all the databases which is taken care through shell script.

Re: UDF missing after Impala restart

Explorer

 

 

That is a really "broad definition for "Resolved" hahaha.... quite a hack for a solution that should be supported from the get go in Impala.

Re: UDF missing after Impala restart

Master Collaborator

We shipped proper support for persistent UDFs about a year ago in Impala 2.5/CDH5.7: https://issues.cloudera.org/browse/IMPALA-1748

 

I agree the previous state of affairs was pretty frustrating.