Created on 08-29-2019 05:28 AM - edited 08-29-2019 05:31 AM
I've upgraded to a HDP 3.1 and now want to read a Hive external table in my Spark application.
The following table shows the compatibilites: https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.1.0/integrating-hive/content/hive_configure_a_s...
I don't have LLAP activated, so it seems that I'm restricted on the Spark -> Hive access and vice-versa, right?
But the compatibility table sais, that I can access external Hive tables by Spark without using the HWC (and also without LLAP), but with the hint that the Table must be defined in Spark catalog. What do I have to do here?
I tried the following code, but it sais Table not found!
SparkSession session = SparkSession.builder()
.config("spark.executor.instances", "4")
.master("yarn-client")
.appName("Spark LetterCount")
.config("hive.metastore.uris", "thrift://myhost.com:9083")
.config("hive.metastore.warehouse.dir", "/warehouse/tablespace/managed/hive")
.config("hive.metastore.warehouse.external.dir", "/warehouse/tablespace/external/hive")
.config("spark.sql.warehouse.dir", new File("spark-warehouse").getAbsolutePath())
.config("spark.sql.hive.hiveserver2.jdbc.url", "jdbc:hive2://localhost:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;user=student30")
.enableHiveSupport();
Dataset<Row> dsRead = session.sql("SELECT * FROM hivedb.external_table");
System.out.println(dsRead.count());
Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or view not found: `hivedb`.`external_table`; line 1 pos 14;
'Project [*]
+- 'UnresolvedRelation `hivedb`.`external_table`
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:86)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:84)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:126)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:126)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:84)
at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:92)
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:105)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:74)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
at main.SparkSQLExample.main(SparkSQLExample.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Can someone help me, to solve the issue? Thank you!
Created 12-11-2019 09:01 AM
I'm having the exact same problem. Two node HDP 3.1.0.0 cluster, non-Kerberized, Spark cannot read an external Hive table. Fails with UnresolvedRelation, just as yours. I'm using plain spark-shell to rule out any issues with my more complicated Spark application. Even then, I cannot get the query to succeed. Have tried setting HADOOP_CONF_DIR=/etc/hadoop/conf (env var) before launching, which doesn't help. The following is the spark-shell interactive session I'm trying:
import org.apache.spark.sql.{DataFrame, SparkSession};
val newSpark = SparkSession.builder().config("spark.sql.catalogImplementation", "hive").config("hive.exec.dynamic.partition", "true").config("hive.exec.dynamic.partition.mode", "nonstrict").enableHiveSupport().getOrCreate()
newSpark.sql("SELECT * FROM hive_db.hive_table")
This same SELECT query works fine from the beeline utility, on the same node.
Any suggestions here?
Created 04-10-2020 05:14 AM
this issue resolved ???i am also acing the same issue
please suggest