Created 02-23-2017 08:43 AM
Spark Thriftserver is unable to run a HiveUDTF. It throws the error that it is unable to find the functions although the function registration succeeds and the funtions does show up in the list output by show functions.
I am using a Hive UDTF, registering it using a jar placed on my local machine. Calling it using the following commands:
//Registering the functions, this command succeeds.
CREATE FUNCTION SampleUDTF AS 'com.fuzzylogix.experiments.udf.hiveUDF.SampleUDTF' USING JAR '/root/spark_files/experiments-1.2.jar';
//Thriftserver is able to look up the function, on this command:
DESCRIBE FUNCTION SampleUDTF;
Output: +-----------------------------------------------------------+--+ | function_desc | +-----------------------------------------------------------+--+ | Function: default.SampleUDTF | | Class: com.fuzzylogix.experiments.udf.hiveUDF.SampleUDTF | | Usage: N/A. | +-----------------------------------------------------------+--+
// Calling the function:
SELECT SampleUDTF('Paris');
Output : Error: org.apache.spark.sql.AnalysisException: Undefined function: 'SampleUDTF'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 7 (state=,code=0)
I have also tried with using a non-local (on hdfs) jar, but I get the same error.
My environment: HDP 2.5 with spark 2.0.0
Created 03-04-2017 01:22 AM
Hey I have the same problem here!
With spark-1.6.2-bin-hadoop2.6 and spark-2.0.1-bin-hadoop2.6 I get the following behavior:
spark-sql> describe function tc.dt_to_date;Function: tc.dt_to_date Class: com.XXXX.DTToDateUsage: N/A. Time taken: 0.127 seconds, Fetched 3 row(s) spark-sql> select tc.dt_to_date('2016-11-01') from dwh.dim_geography limit 1; 17/03/04 01:26:49 INFO execution.SparkSqlParser: Parsing command: select tc.dt_to_date('2016-11-01') from dwh.dim_geography limit 1 17/03/04 01:26:49 INFO parser.CatalystSqlParser: Parsing command: int 17/03/04 01:26:49 INFO parser.CatalystSqlParser: Parsing command: string 17/03/04 01:26:49 INFO parser.CatalystSqlParser: Parsing command: string 17/03/04 01:26:49 INFO parser.CatalystSqlParser: Parsing command: int 17/03/04 01:26:49 INFO parser.CatalystSqlParser: Parsing command: string 17/03/04 01:26:49 INFO parser.CatalystSqlParser: Parsing command: int 17/03/04 01:26:49 INFO parser.CatalystSqlParser: Parsing command: string 17/03/04 01:26:49 INFO parser.CatalystSqlParser: Parsing command: int 17/03/04 01:26:49 INFO parser.CatalystSqlParser: Parsing command: int 17/03/04 01:26:49 INFO parser.CatalystSqlParser: Parsing command: string 17/03/04 01:26:49 INFO parser.CatalystSqlParser: Parsing command: string 17/03/04 01:26:49 INFO parser.CatalystSqlParser: Parsing command: int 17/03/04 01:26:49 INFO parser.CatalystSqlParser: Parsing command: string 17/03/04 01:26:49 INFO parser.CatalystSqlParser: Parsing command: int 17/03/04 01:26:49 INFO parser.CatalystSqlParser: Parsing command: string 17/03/04 01:26:49 INFO parser.CatalystSqlParser: Parsing command: int Error in query: Undefined function: 'tc.dt_to_date'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 7
I used the --jars option to initialize spark-sql com command line and referred to the jar package where the function is defined. Notice how describe function is able to identify the pacakge name but the Usage field remains as N/A.
With spark-1.5.0-bin-hadoop2.6 and spark-1.5.2-bin-hadoop2.6 it works fine.
Created 03-09-2017 06:38 PM
Could you share your CREATE FUNCTION statement?
Created 03-06-2017 06:55 PM
does the hivecontex enable?
Created 07-05-2017 12:21 PM
Error while trying to access a table in hive db, used hive GenericUDTF , used .enableHiveSupport() during SparkSession(), running via hiveServer2, in a HDP 2.5 with Spark2
The stack trace is:
2017-07-03 11:13:43,623 ERROR [HiveServer2-Background-Pool: Thread-1061]: SessionState (SessionState.java:printError(989)) - Status: Failed 2017-07-03 11:13:43,623 ERROR [HiveServer2-Background-Pool: Thread-1061]: SessionState (SessionState.java:printError(989)) - Vertex failed, vertexName=Map 1, vertexId=vertex_1499067308783_0051_1_00, diagnostics=[Task failed, taskId=task_1499067308783_0051_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150) ... 14 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:563) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83) ... 17 more Caused by: org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149) at org.apache.spark.SparkContext.<init>(SparkContext.scala:497) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2275) at org.apache.spark.sql.SparkSession$Builder$anonfun$8.apply(SparkSession.scala:831) at org.apache.spark.sql.SparkSession$Builder$anonfun$8.apply(SparkSession.scala:823) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823) at com.fuzzylogix.experiments.udf.hiveSparkUDF.SampleSparkUDTF_yarnV1.sparkJob(SampleSparkUDTF_yarnV1.java:97) at com.fuzzylogix.experiments.udf.hiveSparkUDF.SampleSparkUDTF_yarnV1.process(SampleSparkUDTF_yarnV1.java:78) at org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:109)