Support Questions

dmueller1607 · ‎08-29-2019

I've upgraded to a HDP 3.1 and now want to read a Hive external table in my Spark application.

The following table shows the compatibilites: https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.1.0/integrating-hive/content/hive_configure_a_s...

I don't have LLAP activated, so it seems that I'm restricted on the Spark -> Hive access and vice-versa, right?

But the compatibility table sais, that I can access external Hive tables by Spark without using the HWC (and also without LLAP), but with the hint that the Table must be defined in Spark catalog. What do I have to do here?

I tried the following code, but it sais Table not found!

SparkSession session = SparkSession.builder()
  .config("spark.executor.instances", "4")
  .master("yarn-client")
  .appName("Spark LetterCount")
  .config("hive.metastore.uris", "thrift://myhost.com:9083")
  .config("hive.metastore.warehouse.dir", "/warehouse/tablespace/managed/hive")
  .config("hive.metastore.warehouse.external.dir", "/warehouse/tablespace/external/hive")
  .config("spark.sql.warehouse.dir", new File("spark-warehouse").getAbsolutePath())
  .config("spark.sql.hive.hiveserver2.jdbc.url", "jdbc:hive2://localhost:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;user=student30")
  .enableHiveSupport();

Dataset<Row> dsRead = session.sql("SELECT * FROM hivedb.external_table");
System.out.println(dsRead.count());

Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or view not found: `hivedb`.`external_table`; line 1 pos 14;
'Project [*]
+- 'UnresolvedRelation `hivedb`.`external_table`

at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:86)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:84)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:126)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:126)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:84)
at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:92)
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:105)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:74)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
at main.SparkSQLExample.main(SparkSQLExample.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Can someone help me, to solve the issue? Thank you!

JeffEvans · ‎12-11-2019

I'm having the exact same problem. Two node HDP 3.1.0.0 cluster, non-Kerberized, Spark cannot read an external Hive table. Fails with UnresolvedRelation, just as yours. I'm using plain spark-shell to rule out any issues with my more complicated Spark application. Even then, I cannot get the query to succeed. Have tried setting HADOOP_CONF_DIR=/etc/hadoop/conf (env var) before launching, which doesn't help. The following is the spark-shell interactive session I'm trying:

import org.apache.spark.sql.{DataFrame, SparkSession};

val newSpark = SparkSession.builder().config("spark.sql.catalogImplementation", "hive").config("hive.exec.dynamic.partition", "true").config("hive.exec.dynamic.partition.mode", "nonstrict").enableHiveSupport().getOrCreate()

newSpark.sql("SELECT * FROM hive_db.hive_table")

This same SELECT query works fine from the beeline utility, on the same node.

Any suggestions here?

Abhishek_721 · ‎04-10-2020

this issue resolved ???i am also acing the same issue
please suggest

Cloudera Community

Support Questions

Reading external Hive table from Spark in Hadoop 3