Support Questions

Find answers, ask questions, and share your expertise

Custom Hive function is not visible to other Hiveserver2 instances

avatar
Explorer

Environment:

HDP2.6.2-205

Hive (version 2.1.0.2.6.2.0-205)

There are 2 Hiveserver2 instances lab-node101 and lab-node102.

I created a custom function "testdb.user001_udf" on lab-node101:

0: jdbc:hive2://lab-node101.> CREATE FUNCTION testdb.user001_udf AS 'com.dummy.test.user001.CustomUDF' USING JAR 'hdfs://nameservice1/user/user001/custom-udf-1.0.0.jar';
Getting log thread is interrupted, since query is done!
INFO  : converting to local hdfs://nameservice1/user/user001/custom-udf-1.0.0.jar
INFO  : Added [/tmp/9f1c530a-9774-4454-a048-93fb8f86e09c_resources/custom-udf-1.0.0.jar] to class path
INFO  : Added resources: [hdfs://nameservice1/user/user001/custom-udf-1.0.0.jar]
No rows affected (0.109 seconds)
0: jdbc:hive2://lab-node101.> SHOW FUNCTIONS LIKE '*udf';
Getting log thread is interrupted, since query is done!
+----------------------+--+
|       tab_name       |
+----------------------+--+
| testdb.user001_udf  |
+----------------------+--+
1 rows selected (0.012 seconds)

When I connect to different Hiveserver2 instance lab-node102, it doesn't know about the function.

0: jdbc:hive2://lab-node102> SHOW FUNCTIONS LIKE '*udf';
Getting log thread is interrupted, since query is done!
+----------------------+--+
|       tab_name       |
+----------------------+--+
+----------------------+--+
0 rows selected (0.168 seconds)

0: jdbc:hive2://lab-node102.> desc function extended user001_udf;
Getting log thread is interrupted, since query is done!
INFO  : Compiling command(queryId=hive_20180712150410_de011aa2-7300-45d9-855c-11904692caa9): desc function extended user001_udf
INFO  : We are setting the hadoop caller context from HIVE_SSN_ID:a226223f-c7d9-4e42-9a92-dd0c753587d4 to hive_20180712150410_de011aa2-7300-45d9-855c-11904692caa9
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=hive_20180712150410_de011aa2-7300-45d9-855c-11904692caa9); Time taken: 0.005 seconds
INFO  : We are resetting the hadoop caller context to HIVE_SSN_ID:a226223f-c7d9-4e42-9a92-dd0c753587d4
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Setting caller context to query id hive_20180712150410_de011aa2-7300-45d9-855c-11904692caa9
INFO  : Executing command(queryId=hive_20180712150410_de011aa2-7300-45d9-855c-11904692caa9): desc function extended user001_udf
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Resetting the caller context to HIVE_SSN_ID:a226223f-c7d9-4e42-9a92-dd0c753587d4
INFO  : Completed executing command(queryId=hive_20180712150410_de011aa2-7300-45d9-855c-11904692caa9); Time taken: 0.002 seconds
INFO  : OK
+------------------------------------------+--+
|                 tab_name                 |
+------------------------------------------+--+
| Function 'user001_udf' does not exist.  |
+------------------------------------------+--+

0: jdbc:hive2://lab-node102.> SELECT testdb.user001_udf(dname) FROM testdb.dept;
Getting log thread is interrupted, since query is done!
Error: Error while compiling statement: FAILED: SemanticException [Error 10011]: Invalid function testdb.user001_udf (state=42000,code=10011)
org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 10011]: Invalid function testdb.user001_udf
...

It should show functions since metadata is shared between all Hiveserver2 instances lab-node10[1-2].

I confirmed there is function info in metastore DB (MySQL)

MariaDB [metastore]> SELECT * FROM FUNC_RU;
+---------+---------------+--------------------------------------------------------+-------------+
| FUNC_ID | RESOURCE_TYPE | RESOURCE_URI                                           | INTEGER_IDX |
+---------+---------------+---------------------------------------------------------+-------------+
|       1 |             1 | hdfs://nameservice1/user/user001/custom-udf-1.0.0.jar |           0 |
+---------+---------------+--------------------------------------------------------+-------------+
1 rows in set (0.00 sec)


MariaDB [metastore]> SELECT * FROM FUNCS;
+---------+----------------------------------+-------------+-------+--------------+-----------+------------+------------+
| FUNC_ID | CLASS_NAME                       | CREATE_TIME | DB_ID |  FUNC_NAME   | FUNC_TYPE | OWNER_NAME | OWNER_TYPE |
+---------+----------------------------------+-------------+-------+--------------+-----------+------------+------------+
|       1 | com.dummy.test.user001.CustomUDF |  1531980939 |     7 |  user001_udf |         1 | user001   | USER       |
+---------+----------------------------------+-------------+-------+--------------+-----------+------------+------------+
1 rows in set (0.00 sec)

Workaround:

I tried creating "testdb.user001_udf" on lab-node102, it got error "Function user001_udf already exists":

0: jdbc:hive2://lab-node102.> CREATE FUNCTION testdb.user001_udf AS 'com.dummy.test.user001.CustomUDF' USING JAR 'hdfs://nameservice1/user/user001/custom-udf-1.0.0.jar';
Getting log thread is interrupted, since query is done!
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask. AlreadyExistsException(message:Function user001_udf already exists) (state=08S01,code=1)
java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask. AlreadyExistsException(message:Function user001_udf already exists)

But after that, "user001_udf" became visible on lab-node102:

0: jdbc:hive2://lab-node102.> SHOW FUNCTIONS LIKE '*udf';
Getting log thread is interrupted, since query is done!
+----------------------+--+
|       tab_name       |
+----------------------+--+
| testdb.user001_udf  |
+----------------------+--+
1 rows selected (0.03 seconds)
0: jdbc:hive2://lab-node102.> SELECT testdb.user001_udf(dname) FROM dept;
Getting log thread is interrupted, since query is done!
+--------------------+--+
|        _c0         |
+--------------------+--+
| Version EE: test2  |
| Version EE: test   |
+--------------------+--+
2 rows selected (0.162 seconds)

Is this a bug or configure problem, etc? Does anyone know how about the solution?

Thanks a lot in advance.

BTW, I also did test with Hive version 1.2.1000.2.6.2.0-205, there is no problem to use UDF from other Hiveserver2 instance.

1 ACCEPTED SOLUTION

avatar
Explorer

Found solution:

RELOAD FUNCTION;

As the official document explained:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/Re...
As of HIVE-2573, creating permanent functions in one Hive CLI session may not be reflected in HiveServer2 or other Hive CLI sessions, if they were started before the function was created. Issuing RELOAD FUNCTION within a HiveServer2 or HiveCLI session will allow it to pick up any changes to the permanent functions that may have been done by a different HiveCLI session. Due to backward compatibility reasons RELOAD FUNCTION; is also accepted.

View solution in original post

1 REPLY 1

avatar
Explorer

Found solution:

RELOAD FUNCTION;

As the official document explained:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/Re...
As of HIVE-2573, creating permanent functions in one Hive CLI session may not be reflected in HiveServer2 or other Hive CLI sessions, if they were started before the function was created. Issuing RELOAD FUNCTION within a HiveServer2 or HiveCLI session will allow it to pick up any changes to the permanent functions that may have been done by a different HiveCLI session. Due to backward compatibility reasons RELOAD FUNCTION; is also accepted.