Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HDP 2.6 - SparkSession via HS2 - Error

HDP 2.6 - SparkSession via HS2 - Error

Explorer

Steps:

1. Created a java class extending hive GenericUDTF, created SparkSession in that:

Public Class sparkUDTF extends GenericHiveUDTF {
...
static Long sparkJob(String tableName) {
  SparkSession spark = SparkSession.builder().enableHiveSupport().master("yarn-client").appName("SampleSparkUDTF_yarnV1").getOrCreate();
  Dataset inputData = spark.read().table(tableName); //input to the function “text”, “hive table”
  Long countRows =  inputData.count(); //access hive table
  return countRows;
  }
}

2. Copied this custom UDTF jar into hdfs and also into auxlib

3. Copied /usr/hdp/<2.6.x>/spark2/jars/*.jar into /usr/hdp/<2.6.x>/hive/auxlib/

4. Connecting to HS2 using beeline to run this Spark UDT:

beeline -u jdbc:hive2://localhost:10000 -d org.apache.hive.jdbc.HiveDriver
CREATE TABLE TestTable (i int);INSERT INTO TestTable VALUES (1);

CREATE FUNCTION
SparkUDT AS 'SparkHiveUDTF' using jar
'hdfs:///tmp/sparkHiveGenericUDTF-1.0.jar' ;

SELECT SparkUDT('tbl','TestTable');

On HDP 2.6 cluster, Spark 2.1 - causes this error:

Caused by: java.lang.IllegalStateException: Library directory '/hadoop/yarn/local/usercache/hive/appcache/application_1499162780176_0014/container_e03_1 499162780176_0014_01_000005/assembly/target/scala-2.11/jars' does not exist; make sure Spark is built. at org.apache.spark.launcher.CommandBuilderUtils.checkState(CommandBuilderUtils.java:260) at org.apache.spark.launcher.CommandBuilderUtils.findJarsDir(CommandBuilderUtils.java:380) at org.apache.spark.launcher.YarnCommandBuilderUtils$.findJarsDir(YarnCommandBuilderUtils.scala:38) at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:570) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:895) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:171) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156) at org.apache.spark.SparkContext.<init>(SparkContext.scala:509) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2320) at org.apache.spark.sql.SparkSession$Builder$anonfun$6.apply(SparkSession.scala:868) at org.apache.spark.sql.SparkSession$Builder$anonfun$6.apply(SparkSession.scala:860) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860) at SparkHiveUDTF.sparkJob(SparkHiveUDTF.java:97) at SparkHiveUDTF.process(SparkHiveUDTF.java:78) at org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:109) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:133) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:170) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:555) ... 18 more

4 REPLIES 4
Highlighted

Re: HDP 2.6 - SparkSession via HS2 - Error

Explorer

After setting the config in SparkSession source code:

SparkSession spark = SparkSession

.builder()

.enableHiveSupport()

.master("yarn-client")

.appName("SampleSparkUDTF_yarnV1")

.config("spark.yarn.jars","hdfs:///hdp/apps/2.6.1.0-129/spark2")

.config("spark.yarn.am.extraJavaOptions","-Dhdp.version=2.6.1.0-129")

.config("spark.driver.extra.JavaOptions","-Dhdp.version=2.6.1.0-129")

.config("spark.executor.memory","4g")

.getOrCreate();

While testing via HS2 & this is the error:

beeline -u jdbc:hive2://localhost:10000 -d org.apache.hive.jdbc.HiveDriver

0: jdbc:hive2://localhost:10000>

……

], TaskAttempt 3 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150) ... 14 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:563) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83) ... 17 more Caused by: org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156) at org.apache.spark.SparkContext.<init>(SparkContext.scala:509) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2320) at org.apache.spark.sql.SparkSession$Builder$anonfun$6.apply(SparkSession.scala:868) at org.apache.spark.sql.SparkSession$Builder$anonfun$6.apply(SparkSession.scala:860) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860) at SparkHiveUDTF.sparkJob(SparkHiveUDTF.java:102) at SparkHiveUDTF.process(SparkHiveUDTF.java:78) at org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:109) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:133) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:170) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:555) ... 18 more

Highlighted

Re: HDP 2.6 - SparkSession via HS2 - Error

@sudha Can you try the same with Spark thrift server?

Highlighted

Re: HDP 2.6 - SparkSession via HS2 - Error

Explorer

On spark thrift server, after create function, when function is called (SELECT SparkUDTF('txt','table')) gives error that function is not recognized.

Highlighted

Re: HDP 2.6 - SparkSession via HS2 - Error

Explorer

While testing like this, it does not read hive-site.xml, spark-env.sh of the cluster.

Is there a way to make it read spark config present in the cluster?

Don't have an account?
Coming from Hortonworks? Activate your account here