Support Questions
Find answers, ask questions, and share your expertise

HDP 2.6 - SparkSession via HS2 - Error

Explorer

Steps:

1. Created a java class extending hive GenericUDTF, created SparkSession in that:

Public Class sparkUDTF extends GenericHiveUDTF {
...
static Long sparkJob(String tableName) {
  SparkSession spark = SparkSession.builder().enableHiveSupport().master("yarn-client").appName("SampleSparkUDTF_yarnV1").getOrCreate();
  Dataset inputData = spark.read().table(tableName); //input to the function “text”, “hive table”
  Long countRows =  inputData.count(); //access hive table
  return countRows;
  }
}

2. Copied this custom UDTF jar into hdfs and also into auxlib

3. Copied /usr/hdp/<2.6.x>/spark2/jars/*.jar into /usr/hdp/<2.6.x>/hive/auxlib/

4. Connecting to HS2 using beeline to run this Spark UDT:

beeline -u jdbc:hive2://localhost:10000 -d org.apache.hive.jdbc.HiveDriver
CREATE TABLE TestTable (i int);INSERT INTO TestTable VALUES (1);

CREATE FUNCTION
SparkUDT AS 'SparkHiveUDTF' using jar
'hdfs:///tmp/sparkHiveGenericUDTF-1.0.jar' ;

SELECT SparkUDT('tbl','TestTable');

On HDP 2.6 cluster, Spark 2.1 - causes this error:

Caused by: java.lang.IllegalStateException: Library directory '/hadoop/yarn/local/usercache/hive/appcache/application_1499162780176_0014/container_e03_1 499162780176_0014_01_000005/assembly/target/scala-2.11/jars' does not exist; make sure Spark is built. at org.apache.spark.launcher.CommandBuilderUtils.checkState(CommandBuilderUtils.java:260) at org.apache.spark.launcher.CommandBuilderUtils.findJarsDir(CommandBuilderUtils.java:380) at org.apache.spark.launcher.YarnCommandBuilderUtils$.findJarsDir(YarnCommandBuilderUtils.scala:38) at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:570) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:895) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:171) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156) at org.apache.spark.SparkContext.<init>(SparkContext.scala:509) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2320) at org.apache.spark.sql.SparkSession$Builder$anonfun$6.apply(SparkSession.scala:868) at org.apache.spark.sql.SparkSession$Builder$anonfun$6.apply(SparkSession.scala:860) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860) at SparkHiveUDTF.sparkJob(SparkHiveUDTF.java:97) at SparkHiveUDTF.process(SparkHiveUDTF.java:78) at org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:109) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:133) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:170) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:555) ... 18 more

4 REPLIES 4

Explorer

After setting the config in SparkSession source code:

SparkSession spark = SparkSession

.builder()

.enableHiveSupport()

.master("yarn-client")

.appName("SampleSparkUDTF_yarnV1")

.config("spark.yarn.jars","hdfs:///hdp/apps/2.6.1.0-129/spark2")

.config("spark.yarn.am.extraJavaOptions","-Dhdp.version=2.6.1.0-129")

.config("spark.driver.extra.JavaOptions","-Dhdp.version=2.6.1.0-129")

.config("spark.executor.memory","4g")

.getOrCreate();

While testing via HS2 & this is the error:

beeline -u jdbc:hive2://localhost:10000 -d org.apache.hive.jdbc.HiveDriver

0: jdbc:hive2://localhost:10000>

……

], TaskAttempt 3 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150) ... 14 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:563) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83) ... 17 more Caused by: org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156) at org.apache.spark.SparkContext.<init>(SparkContext.scala:509) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2320) at org.apache.spark.sql.SparkSession$Builder$anonfun$6.apply(SparkSession.scala:868) at org.apache.spark.sql.SparkSession$Builder$anonfun$6.apply(SparkSession.scala:860) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860) at SparkHiveUDTF.sparkJob(SparkHiveUDTF.java:102) at SparkHiveUDTF.process(SparkHiveUDTF.java:78) at org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:109) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:133) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:170) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:555) ... 18 more

@sudha Can you try the same with Spark thrift server?

Explorer

On spark thrift server, after create function, when function is called (SELECT SparkUDTF('txt','table')) gives error that function is not recognized.

Explorer

While testing like this, it does not read hive-site.xml, spark-env.sh of the cluster.

Is there a way to make it read spark config present in the cluster?

; ;