Created 07-20-2018 02:49 AM
Environment:
HDP2.6.2-205
Hive (version 2.1.0.2.6.2.0-205)
There are 2 Hiveserver2 instances lab-node101 and lab-node102.
I created a custom function "testdb.user001_udf" on lab-node101:
0: jdbc:hive2://lab-node101.> CREATE FUNCTION testdb.user001_udf AS 'com.dummy.test.user001.CustomUDF' USING JAR 'hdfs://nameservice1/user/user001/custom-udf-1.0.0.jar'; Getting log thread is interrupted, since query is done! INFO : converting to local hdfs://nameservice1/user/user001/custom-udf-1.0.0.jar INFO : Added [/tmp/9f1c530a-9774-4454-a048-93fb8f86e09c_resources/custom-udf-1.0.0.jar] to class path INFO : Added resources: [hdfs://nameservice1/user/user001/custom-udf-1.0.0.jar] No rows affected (0.109 seconds) 0: jdbc:hive2://lab-node101.> SHOW FUNCTIONS LIKE '*udf'; Getting log thread is interrupted, since query is done! +----------------------+--+ | tab_name | +----------------------+--+ | testdb.user001_udf | +----------------------+--+ 1 rows selected (0.012 seconds)
When I connect to different Hiveserver2 instance lab-node102, it doesn't know about the function.
0: jdbc:hive2://lab-node102> SHOW FUNCTIONS LIKE '*udf'; Getting log thread is interrupted, since query is done! +----------------------+--+ | tab_name | +----------------------+--+ +----------------------+--+ 0 rows selected (0.168 seconds) 0: jdbc:hive2://lab-node102.> desc function extended user001_udf; Getting log thread is interrupted, since query is done! INFO : Compiling command(queryId=hive_20180712150410_de011aa2-7300-45d9-855c-11904692caa9): desc function extended user001_udf INFO : We are setting the hadoop caller context from HIVE_SSN_ID:a226223f-c7d9-4e42-9a92-dd0c753587d4 to hive_20180712150410_de011aa2-7300-45d9-855c-11904692caa9 INFO : Semantic Analysis Completed INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null) INFO : Completed compiling command(queryId=hive_20180712150410_de011aa2-7300-45d9-855c-11904692caa9); Time taken: 0.005 seconds INFO : We are resetting the hadoop caller context to HIVE_SSN_ID:a226223f-c7d9-4e42-9a92-dd0c753587d4 INFO : Concurrency mode is disabled, not creating a lock manager INFO : Setting caller context to query id hive_20180712150410_de011aa2-7300-45d9-855c-11904692caa9 INFO : Executing command(queryId=hive_20180712150410_de011aa2-7300-45d9-855c-11904692caa9): desc function extended user001_udf INFO : Starting task [Stage-0:DDL] in serial mode INFO : Resetting the caller context to HIVE_SSN_ID:a226223f-c7d9-4e42-9a92-dd0c753587d4 INFO : Completed executing command(queryId=hive_20180712150410_de011aa2-7300-45d9-855c-11904692caa9); Time taken: 0.002 seconds INFO : OK +------------------------------------------+--+ | tab_name | +------------------------------------------+--+ | Function 'user001_udf' does not exist. | +------------------------------------------+--+ 0: jdbc:hive2://lab-node102.> SELECT testdb.user001_udf(dname) FROM testdb.dept; Getting log thread is interrupted, since query is done! Error: Error while compiling statement: FAILED: SemanticException [Error 10011]: Invalid function testdb.user001_udf (state=42000,code=10011) org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 10011]: Invalid function testdb.user001_udf ...
It should show functions since metadata is shared between all Hiveserver2 instances lab-node10[1-2].
I confirmed there is function info in metastore DB (MySQL)
MariaDB [metastore]> SELECT * FROM FUNC_RU; +---------+---------------+--------------------------------------------------------+-------------+ | FUNC_ID | RESOURCE_TYPE | RESOURCE_URI | INTEGER_IDX | +---------+---------------+---------------------------------------------------------+-------------+ | 1 | 1 | hdfs://nameservice1/user/user001/custom-udf-1.0.0.jar | 0 | +---------+---------------+--------------------------------------------------------+-------------+ 1 rows in set (0.00 sec) MariaDB [metastore]> SELECT * FROM FUNCS; +---------+----------------------------------+-------------+-------+--------------+-----------+------------+------------+ | FUNC_ID | CLASS_NAME | CREATE_TIME | DB_ID | FUNC_NAME | FUNC_TYPE | OWNER_NAME | OWNER_TYPE | +---------+----------------------------------+-------------+-------+--------------+-----------+------------+------------+ | 1 | com.dummy.test.user001.CustomUDF | 1531980939 | 7 | user001_udf | 1 | user001 | USER | +---------+----------------------------------+-------------+-------+--------------+-----------+------------+------------+ 1 rows in set (0.00 sec)
Workaround:
I tried creating "testdb.user001_udf" on lab-node102, it got error "Function user001_udf already exists":
0: jdbc:hive2://lab-node102.> CREATE FUNCTION testdb.user001_udf AS 'com.dummy.test.user001.CustomUDF' USING JAR 'hdfs://nameservice1/user/user001/custom-udf-1.0.0.jar'; Getting log thread is interrupted, since query is done! Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask. AlreadyExistsException(message:Function user001_udf already exists) (state=08S01,code=1) java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask. AlreadyExistsException(message:Function user001_udf already exists)
But after that, "user001_udf" became visible on lab-node102:
0: jdbc:hive2://lab-node102.> SHOW FUNCTIONS LIKE '*udf'; Getting log thread is interrupted, since query is done! +----------------------+--+ | tab_name | +----------------------+--+ | testdb.user001_udf | +----------------------+--+ 1 rows selected (0.03 seconds) 0: jdbc:hive2://lab-node102.> SELECT testdb.user001_udf(dname) FROM dept; Getting log thread is interrupted, since query is done! +--------------------+--+ | _c0 | +--------------------+--+ | Version EE: test2 | | Version EE: test | +--------------------+--+ 2 rows selected (0.162 seconds)
Is this a bug or configure problem, etc? Does anyone know how about the solution?
Thanks a lot in advance.
BTW, I also did test with Hive version 1.2.1000.2.6.2.0-205, there is no problem to use UDF from other Hiveserver2 instance.
Created 08-23-2019 01:51 AM
Found solution:
RELOAD FUNCTION;
As the official document explained:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/Re...
As of HIVE-2573, creating permanent functions in one Hive CLI session may not be reflected in HiveServer2 or other Hive CLI sessions, if they were started before the function was created. Issuing RELOAD FUNCTION within a HiveServer2 or HiveCLI session will allow it to pick up any changes to the permanent functions that may have been done by a different HiveCLI session. Due to backward compatibility reasons RELOAD FUNCTION; is also accepted.
Created 08-23-2019 01:51 AM
Found solution:
RELOAD FUNCTION;
As the official document explained:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/Re...
As of HIVE-2573, creating permanent functions in one Hive CLI session may not be reflected in HiveServer2 or other Hive CLI sessions, if they were started before the function was created. Issuing RELOAD FUNCTION within a HiveServer2 or HiveCLI session will allow it to pick up any changes to the permanent functions that may have been done by a different HiveCLI session. Due to backward compatibility reasons RELOAD FUNCTION; is also accepted.