Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Spark SQL: Thriftserver unable to run a registered Hive UDTF

avatar
New Contributor

Spark Thriftserver is unable to run a HiveUDTF. It throws the error that it is unable to find the functions although the function registration succeeds and the funtions does show up in the list output by show functions.

I am using a Hive UDTF, registering it using a jar placed on my local machine. Calling it using the following commands:

//Registering the functions, this command succeeds.

CREATE FUNCTION SampleUDTF AS 'com.fuzzylogix.experiments.udf.hiveUDF.SampleUDTF' USING JAR '/root/spark_files/experiments-1.2.jar';

//Thriftserver is able to look up the function, on this command:

DESCRIBE FUNCTION SampleUDTF; 
Output: 
+-----------------------------------------------------------+--+
|                       function_desc                       |
+-----------------------------------------------------------+--+
| Function: default.SampleUDTF                              |
| Class: com.fuzzylogix.experiments.udf.hiveUDF.SampleUDTF  |
| Usage: N/A.                                               |
+-----------------------------------------------------------+--+

// Calling the function:

SELECT SampleUDTF('Paris');
Output :
Error: org.apache.spark.sql.AnalysisException: Undefined function: 'SampleUDTF'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 7 (state=,code=0) 

I have also tried with using a non-local (on hdfs) jar, but I get the same error.

My environment: HDP 2.5 with spark 2.0.0

4 REPLIES 4

avatar
New Contributor

Hey I have the same problem here!

With spark-1.6.2-bin-hadoop2.6 and spark-2.0.1-bin-hadoop2.6 I get the following behavior:

spark-sql> describe function tc.dt_to_date;Function: tc.dt_to_date
Class: com.XXXX.DTToDateUsage: N/A.
Time taken: 0.127 seconds, Fetched 3 row(s)

spark-sql> select tc.dt_to_date('2016-11-01') from dwh.dim_geography limit 1;
17/03/04 01:26:49 INFO execution.SparkSqlParser: Parsing command: select tc.dt_to_date('2016-11-01') from dwh.dim_geography limit 1
17/03/04 01:26:49 INFO parser.CatalystSqlParser: Parsing command: int
17/03/04 01:26:49 INFO parser.CatalystSqlParser: Parsing command: string
17/03/04 01:26:49 INFO parser.CatalystSqlParser: Parsing command: string
17/03/04 01:26:49 INFO parser.CatalystSqlParser: Parsing command: int
17/03/04 01:26:49 INFO parser.CatalystSqlParser: Parsing command: string
17/03/04 01:26:49 INFO parser.CatalystSqlParser: Parsing command: int
17/03/04 01:26:49 INFO parser.CatalystSqlParser: Parsing command: string
17/03/04 01:26:49 INFO parser.CatalystSqlParser: Parsing command: int
17/03/04 01:26:49 INFO parser.CatalystSqlParser: Parsing command: int
17/03/04 01:26:49 INFO parser.CatalystSqlParser: Parsing command: string
17/03/04 01:26:49 INFO parser.CatalystSqlParser: Parsing command: string
17/03/04 01:26:49 INFO parser.CatalystSqlParser: Parsing command: int
17/03/04 01:26:49 INFO parser.CatalystSqlParser: Parsing command: string
17/03/04 01:26:49 INFO parser.CatalystSqlParser: Parsing command: int
17/03/04 01:26:49 INFO parser.CatalystSqlParser: Parsing command: string
17/03/04 01:26:49 INFO parser.CatalystSqlParser: Parsing command: int

Error in query: Undefined function: 'tc.dt_to_date'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 7

I used the --jars option to initialize spark-sql com command line and referred to the jar package where the function is defined. Notice how describe function is able to identify the pacakge name but the Usage field remains as N/A.

With spark-1.5.0-bin-hadoop2.6 and spark-1.5.2-bin-hadoop2.6 it works fine.

avatar
Expert Contributor

Could you share your CREATE FUNCTION statement?

avatar
New Contributor

does the hivecontex enable?

avatar
Explorer

Error while trying to access a table in hive db, used hive GenericUDTF , used .enableHiveSupport() during SparkSession(), running via hiveServer2, in a HDP 2.5 with Spark2

The stack trace is:

2017-07-03 11:13:43,623 ERROR [HiveServer2-Background-Pool: Thread-1061]: SessionState (SessionState.java:printError(989)) - Status: Failed 2017-07-03 11:13:43,623 ERROR [HiveServer2-Background-Pool: Thread-1061]: SessionState (SessionState.java:printError(989)) - Vertex failed, vertexName=Map 1, vertexId=vertex_1499067308783_0051_1_00, diagnostics=[Task failed, taskId=task_1499067308783_0051_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150) ... 14 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:563) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83) ... 17 more Caused by: org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149) at org.apache.spark.SparkContext.<init>(SparkContext.scala:497) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2275) at org.apache.spark.sql.SparkSession$Builder$anonfun$8.apply(SparkSession.scala:831) at org.apache.spark.sql.SparkSession$Builder$anonfun$8.apply(SparkSession.scala:823) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823) at com.fuzzylogix.experiments.udf.hiveSparkUDF.SampleSparkUDTF_yarnV1.sparkJob(SampleSparkUDTF_yarnV1.java:97) at com.fuzzylogix.experiments.udf.hiveSparkUDF.SampleSparkUDTF_yarnV1.process(SampleSparkUDTF_yarnV1.java:78) at org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:109)