Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Reading external Hive table from Spark in Hadoop 3

avatar
Expert Contributor

I've upgraded to a HDP 3.1 and now want to read a Hive external table in my Spark application.

The following table shows the compatibilites: https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.1.0/integrating-hive/content/hive_configure_a_s...

 

I don't have LLAP activated, so it seems that I'm restricted on the Spark -> Hive access and vice-versa, right?

 

But the compatibility table sais, that I can access external Hive tables by Spark without using the HWC (and also without LLAP), but with the hint that the Table must be defined in Spark catalog. What do I have to do here?

 

I tried the following code, but it sais Table not found!

 

SparkSession session = SparkSession.builder()
.config("spark.executor.instances", "4")
.master("yarn-client")
.appName("Spark LetterCount")
.config("hive.metastore.uris", "thrift://myhost.com:9083")
.config("hive.metastore.warehouse.dir", "/warehouse/tablespace/managed/hive")
.config("hive.metastore.warehouse.external.dir", "/warehouse/tablespace/external/hive")
.config("spark.sql.warehouse.dir", new File("spark-warehouse").getAbsolutePath())
.config("spark.sql.hive.hiveserver2.jdbc.url", "jdbc:hive2://localhost:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;user=student30")
.enableHiveSupport();

Dataset<Row> dsRead = session.sql("SELECT * FROM hivedb.external_table");
System.out.println(dsRead.count());

 

Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or view not found: `hivedb`.`external_table`; line 1 pos 14;
'Project [*]
+- 'UnresolvedRelation `hivedb`.`external_table`

at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:86)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:84)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:126)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:126)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:84)
at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:92)
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:105)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:74)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
at main.SparkSQLExample.main(SparkSQLExample.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

 Can someone help me, to solve the issue? Thank you!

2 REPLIES 2

avatar
Explorer

I'm having the exact same problem.  Two node HDP 3.1.0.0 cluster, non-Kerberized, Spark cannot read an external Hive table.  Fails with UnresolvedRelation, just as yours.  I'm using plain spark-shell to rule out any issues with my more complicated Spark application.  Even then, I cannot get the query to succeed.  Have tried setting HADOOP_CONF_DIR=/etc/hadoop/conf (env var) before launching, which doesn't help.  The following is the spark-shell interactive session I'm trying:

 

import org.apache.spark.sql.{DataFrame, SparkSession};

val newSpark = SparkSession.builder().config("spark.sql.catalogImplementation", "hive").config("hive.exec.dynamic.partition", "true").config("hive.exec.dynamic.partition.mode", "nonstrict").enableHiveSupport().getOrCreate()

newSpark.sql("SELECT * FROM hive_db.hive_table")

This same SELECT query works fine from the beeline utility, on the same node.

 

Any suggestions here?

avatar
New Contributor

this issue resolved ???i am also acing the same issue 
please suggest