Support Questions

Find answers, ask questions, and share your expertise

updating hive UDF without restarting the cluster

avatar
Rising Star

Hi,I want to update Hive UDFs without requiring a restart of hive. According to: https://www.cloudera.com/documentation/enterprise/5-14-x/topics/cm_mc_hive_udf.html#concept_zb2_rxr_...

setting

hive.reloadable.aux.jars.path
is required. I have set it to
/user/hive/libs/udf
(which resides on HDFS). However following their documentation I see:
file:///usr/lib/hive/lib/foo.jar
which is confusing me. Does this property only work for files residing on the local file system? Do I understand correctly. that I should execute beelines reload manually? Also in case this property works for HDFS does it automatically pick up the classes in the jar (load them) and no longer requires to specify the CREATE FUNCTION foo AS 'my/path/to/jar-1.jar'?
Desired behaviour:
1. copy jar to HDFS
/user/hive/lib/udf/foo-1.jar 
2. add function to hive:
DROP FUNCTION IF EXISTS foo;
CREATE FUNCTION foo AS 'my.class.path.in.jar.FooUDF' USING JAR '/user/hive/lib/udf/foo-1.jar';
3. add a new jar to HDFS
/user/hive/lib/udf/foo-2.jar

4. update function in hive:

DROP FUNCTION IF EXISTS foo;
CREATE FUNCTION foo AS 'my.class.path.in.jar.FooUDF' USING JAR '/user/hive/lib/udf/foo-2.jar';

This currently does not work and requires a restart of hive. It results in round robin seeing the updated UDF (or still the old one).


How can I get hive to not require a restart when updating UDF? Also I do not want to put the UDF locally into a directory. It should reside on HDFS.

Best, Georg

1 REPLY 1

avatar
Rising Star

Further digging around in the hive source code I have found:

https://github.com/apache/hive/commit/8ce0118ffe517f0c622571778251cbd9f760c4f5#diff-a0e344e574e0fe54...

in particular https://github.com/apache/hive/blob/1eea5a80ded2df33d57b2296b3bed98cb18383fd/ql/src/test/queries/cli...

leads me to believe that hfs should be supported.

--! qt:dataset:src
dfs -mkdir  ${system:test.tmp.dir}/aux;
dfs -cp ${system:hive.root}/data/files/identity_udf.jar ${system:test.tmp.dir}/aux/udfexample.jar;

SET hive.reloadable.aux.jars.path=${system:test.tmp.dir}/aux;
RELOAD;
CREATE TEMPORARY FUNCTION example_iden AS 'IdentityStringUDF';

EXPLAIN
SELECT example_iden(key)
FROM src LIMIT 1;

SELECT example_iden(key)
FROM src LIMIT 1;

DROP TEMPORARY FUNCTION example_iden;

dfs -rm -r ${system:test.tmp.dir}/aux;

EDIT

It appears, that

CREATE TEMPORARY FUNCTION example_iden AS 'IdentityStringUDF';

throws a Warning of :

WARN. . permanent functions created without USIJNG clause will not be replicated

so I assume the

USING /path/to/jar.jar 

is mandatory for permanent UDFs even when reloadable flag is set.