- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
HPL-SQL : Function creation issues in Hive 2.0
- Labels:
-
Apache Hive
Created 06-27-2016 03:28 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When running with hplsql ,below two scenarios are failing.My goal is to create a function and test it. We are using Hive 2.0 version. ************************************************************************ Scenario 1 : ************************************************************************ Here is the function which I'm trying to create and is failing. dg_1 and dg_2 tables are already created in hive, by using the above function we are trying to create dg_join3 table and populate some data. If you run these queries one by one using hplsql then they are created in Hive and are visible. CREATE FUNCTION my_function() RETURNS STRING BEGIN DROP TABLE default.dg_join3 ; CREATE TABLE default.dg_join3 (name string, phone string) ; INSERT INTO TABLE default.dg_join3 SELECT a.name, b.phone from default.dg_1 a,default.dg_2 b where a.id = b.id ; SELECT * FROM default.dg_join3 ; END; ************************************************************************ Scenario 2 : ************************************************************************ hplsql -e "INSERT INTO TABLE default.dg_join3 SELECT a.name, b.phone from default.dg_1 a join default.dg_2 b on a.id = b.id ;" ************************************************************************ Observations : If you run these queries one by one using hplsql then tables are created in Hive and are visible. hplsql -e "DROP TABLE default.dg_join3 ;" hplsql -e "CREATE TABLE default.dg_join3 (name string, phone string) ;" hplsql -e "SELECT * FROM default.dg_join3 ;" ************************************************************************ Log of scneario 2 : ************************************************************************ hplsql -e "INSERT INTO TABLE default.dg_join3 SELECT a.name, b.phone from default.dg_1 a join default.dg_2 b on a.id = b.id ;" WARNING: Use "yarn jar" to launch YARN applications. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/appdata/hwx/apache-hive-2.0.0-bin/lib/hive-jdbc-2.0.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/appdata/hwx/apache-hive-2.0.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/appdata/hwx/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Open connection: jdbc:hive2://I-P:10000 (189 ms) Starting SQL statement Unhandled exception in HPL/SQL org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 1:50 Invalid table alias or column reference 'b': (possible column names are: id, name) at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:258) at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:244) at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:247) at org.apache.hive.hplsql.Conn.executeSql(Conn.java:110) at org.apache.hive.hplsql.Exec.executeSql(Exec.java:590) at org.apache.hive.hplsql.Stmt.insertSelect(Stmt.java:570) at org.apache.hive.hplsql.Stmt.insert(Stmt.java:556) at org.apache.hive.hplsql.Exec.visitInsert_stmt(Exec.java:1094) at org.apache.hive.hplsql.Exec.visitInsert_stmt(Exec.java:51) at org.apache.hive.hplsql.HplsqlParser$Insert_stmtContext.accept(HplsqlParser.java:8154) at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visitChildren(AbstractParseTreeVisitor.java:70) at org.apache.hive.hplsql.Exec.visitStmt(Exec.java:987) at org.apache.hive.hplsql.Exec.visitStmt(Exec.java:51) at org.apache.hive.hplsql.HplsqlParser$StmtContext.accept(HplsqlParser.java:891) at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visitChildren(AbstractParseTreeVisitor.java:70) at org.apache.hive.hplsql.HplsqlBaseVisitor.visitBlock(HplsqlBaseVisitor.java:28) at org.apache.hive.hplsql.HplsqlParser$BlockContext.accept(HplsqlParser.java:412) at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visitChildren(AbstractParseTreeVisitor.java:70) at org.apache.hive.hplsql.Exec.visitProgram(Exec.java:895) at org.apache.hive.hplsql.Exec.visitProgram(Exec.java:51) at org.apache.hive.hplsql.HplsqlParser$ProgramContext.accept(HplsqlParser.java:355) at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42) at org.apache.hive.hplsql.Exec.run(Exec.java:753) at org.apache.hive.hplsql.Exec.run(Exec.java:729) at org.apache.hive.hplsql.Hplsql.main(Hplsql.java:23) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 1:50 Invalid table alias or column reference 'b': (possible column names are: id, name) at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:315) at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:112) at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:181) at org.apache.hive.service.cli.operation.Operation.run(Operation.java:257) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:419) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:406) at sun.reflect.GeneratedMethodAccessor110.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) at com.sun.proxy.$Proxy20.executeStatementAsync(Unknown Source) at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:276) at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:486) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1317) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1302) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:50 Invalid table alias or column reference 'b': (possible column names are: id, name) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:10537) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:10485) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:3822) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:3601) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:8965) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8920) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9765) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9658) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10131) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:329) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10142) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:211) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:459) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:316) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1189) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1183) at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:110) ... 26 more
Created 06-28-2016 09:47 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Below is the code and output from function:-
CREATE FUNCTION my_function() RETURNS STRING BEGIN DROP TABLE default.dg_join3 ; CREATE TABLE default.dg_join3 (name string, phone string) ; INSERT INTO TABLE default.dg_join3 SELECT a.name, b.phone from default.dg_1 a,default.dg_2 b where a.id = b.id ; SELECT * FROM default.dg_join3 ; END; -- Invoke the function PRINT 'world, come with me now'; PRINT my_function();
[root@sandbox hplsql-0.3.17]# ./hplsql -f my_function.txt world, come with me now 16/06/28 09:34:15 INFO jdbc.Utils: Supplied authorities: localhost:10000 16/06/28 09:34:15 INFO jdbc.Utils: Resolved authority: localhost:10000 Open connection: jdbc:hive2://localhost:10000 (414 ms) Starting SQL statement SQL statement executed successfully (387 ms) Starting SQL statement SQL statement executed successfully (1.29 sec) Starting SQL statement SQL statement executed successfully (9.85 sec) Starting query Query executed successfully (880 ms) null
Created 06-28-2016 09:17 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you please verify your create table statement of dg_1 and dg_2?
As below statement that executed successfully at my end without any issue.
[root@sandbox hplsql-0.3.17]# ./hplsql -e "INSERT INTO TABLE default.dg_join3 SELECT a.name, b.phone from default.dg_1 a join default.dg_2 b on a.id = b.id ;"
16/06/28 09:01:13 INFO jdbc.Utils: Supplied authorities: localhost:10000
16/06/28 09:01:13 INFO jdbc.Utils: Resolved authority: localhost:10000
Open connection: jdbc:hive2://localhost:10000 (438 ms) Starting
SQL statement SQL statement executed successfully (11.86 sec)
Created 06-28-2016 09:47 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Below is the code and output from function:-
CREATE FUNCTION my_function() RETURNS STRING BEGIN DROP TABLE default.dg_join3 ; CREATE TABLE default.dg_join3 (name string, phone string) ; INSERT INTO TABLE default.dg_join3 SELECT a.name, b.phone from default.dg_1 a,default.dg_2 b where a.id = b.id ; SELECT * FROM default.dg_join3 ; END; -- Invoke the function PRINT 'world, come with me now'; PRINT my_function();
[root@sandbox hplsql-0.3.17]# ./hplsql -f my_function.txt world, come with me now 16/06/28 09:34:15 INFO jdbc.Utils: Supplied authorities: localhost:10000 16/06/28 09:34:15 INFO jdbc.Utils: Resolved authority: localhost:10000 Open connection: jdbc:hive2://localhost:10000 (414 ms) Starting SQL statement SQL statement executed successfully (387 ms) Starting SQL statement SQL statement executed successfully (1.29 sec) Starting SQL statement SQL statement executed successfully (9.85 sec) Starting query Query executed successfully (880 ms) null
Created 06-28-2016 11:57 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It seems you have installed hive 2.0 on HDP ? How have you done ?
Created 06-29-2016 09:08 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am using HDP2.4 sandbox with HPL/SQL 0.3.17.
Created 06-28-2016 03:13 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Avinash P. Can you please confirm that table dg_2 does have a column called 'phone'?
Created 06-30-2016 11:08 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you all for responding 🙂
@bpreachuk- Yes there is a column called "phone" in "dg_2" table. Here are the table definitions
Current Hive Version - Hive 2.0.0
DDL of the tables dg_1 and dg_2 are as follows.
create table dg_1 (id int, name text);
create table dg_2(id int, phone text);
@Mukesh Kumar The first one which you provided working flawlessly and the query which you pasted for JOIN operation is throwing below errors for us.
hplsql -e "INSERT INTO TABLE default.dg_join3 SELECT a.name, b.phone from default.dg_1 a join default.dg_2 b on a.id = b.id ;"
Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 1:50 Invalid table alias or column reference 'b': (possible column names are: id, name)
And we want to know what is the procedure to make the functions permanent in Hive.
Created 06-30-2016 02:17 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Avinash P. With HPL/SQL your functions do not get stored in the Hive metastore. It behaves a little different than conventional Relational databases.
There are 3 ways to execute the functions and/or make the functions permanent in Hive.
1. Add the functions and stored procedures to the global .hplsqlrc file
2. Include the functions in your code by using the INCLUDE statement (or add INCLUDE statements to the .hplsqlrc file)
3. Have the function/stored procedure code duplicated inside each of your scripts (only for testing, never the best way).
Created 06-30-2016 03:00 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Chuk for responding...We are able to register a simple function and run successfully. When creating the function my_function with two simple tables and are running into issue as provided initially.[All the DDL and function definition are provided]
Appreciate if you could you help in recreating this function in your lab and share the results.
Created 07-01-2016 04:25 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Avinash . Sorry for the delay in responding. I was able to do what you were attempting - please see below for explanation.
I created the 2 tables (dg_1 and dg_2) and loaded them with data. By the way they needed to use datatype string - text won't work.
create table dg_1 (id int, name string); create table dg_2(id int, phone string); insert into dg_1 values (1, 'Bill P'); insert into dg_2 values (1, '952-555-5555');
Now, you were attempting to create a function that processes these 2 tables and drops/inserts data into a 3rd table. Functions aren't really designed for this type of processing. Functions are User-Defined Functions (UDFs) - designed to do simple processing & return a value. Designed to be executed as part of a SQL expression. http://www.hplsql.org/create-function
You should accomplish your task with a stored procedure instead. Here is a code sample that I just run inline (for ease of development).
CREATE procedure load_dg_3 as BEGIN DROP TABLE if exists default.dg_join3 ; CREATE TABLE default.dg_join3 (name string, phone string) ; INSERT INTO TABLE default.dg_join3 SELECT a.name, b.phone from default.dg_1 a,default.dg_2 b where a.id = b.id ; SELECT * FROM default.dg_join3 ; print 'Completed...'; END; execute load_dg_3;
> hplsql -f "load_dg_3"
And the output.
16/06/30 22:15:11 INFO jdbc.Utils: Supplied authorities: localhost:10000 16/06/30 22:15:11 INFO jdbc.Utils: Resolved authority: localhost:10000 Open connection: jdbc:hive2://localhost:10000 (233 ms) Starting SQL statement SQL statement executed successfully (x.xx sec) Starting SQL statement SQL statement executed successfully (x.xx sec) Starting SQL statement SQL statement executed successfully (x.xx sec) Starting query Query executed successfully (xx ms) Bill P 952-555-5555 Completed...
That should be what you're looking for. Hope this helps.