Reply
Highlighted
Expert Contributor
Posts: 192
Registered: ‎07-01-2015
Accepted Solution

Access spark temporary table via JDBC

Hi,

 I have found a general template how to access spark temporary data (id data frame) via an external tool using JDBC. What I have found that it should be quite simple:

1. Run spark-shell or submit spark job

2. Configure HiveContext and then run HiveThirftServer from the job.

 

In a separate session access the thrift server via beeline and query data.

 

Here is my code Spark 2.1:

 

import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
import org.apache.spark.sql.hive.thriftserver._

val sql = new HiveContext(sc)
sql.setConf("hive.server2.thrift.port", "10002")
sql.setConf("hive.server2.authentication","KERBEROS" )
sql.setConf("hive.server2.authentication.kerberos.principal","hive/host1.lab.hadoop.net@LAB.HADOOP.NET" )
sql.setConf("hive.server2.authentication.kerberos.keytab","/home/h.keytab" )
sql.setConf("spark.sql.hive.thriftServer.singleSession","true")
val data = sql.sql("select 112 as id")
data.collect
data.createOrReplaceTempView("yyy")
sql.sql("show tables").show

HiveThriftServer2.startWithContext(sql)     
 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException     

Connect to the JDBC server:

beeline -u "jdbc:hive2://localhost:10002/default;principal=hive/host1.lab.hadoop.net@LAB.HADOOP.NET"

 

However when I try to launch the HiveThriftServer2 I can access the spark thrift but do not see the temporary table. Command "show tables" do not show any temporary table. Trying to query "yyy" throws an error:

 

scala> sql.sql("show tables").collect
res11: Array[org.apache.spark.sql.Row] = Array([,sometablename,true], [,yyy,true])

scala> 17/03/06 11:15:50 ERROR thriftserver.SparkExecuteStatementOperation: Error executing query, currentState RUNNING,
org.apache.spark.sql.AnalysisException: Table or view not found: yyy; line 1 pos 14
        at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:459)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:478)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:463)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:61)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:61)

 

If I create a table from beeline via "create table t as select 100 as id" the table is created and I can see it in spark-shell (data stored locally in spark-warehouse directory) So the other direction is working.

 

So the question what I am missing, why I can't see the temporary table?

 

Thanks

Expert Contributor
Posts: 192
Registered: ‎07-01-2015

Re: Access spark temporary table via JDBC

I have found out what was the problem. The solution is to set the singleSession property to true in command line. Because setting it in a program seems to not work properly.

 

/bin/spark-shell --conf spark.sql.hive.thriftServer.singleSession=true

 

WORKS.

 

/bin/spark-shell

...

sql.setConf("spark.sql.hive.thriftServer.singleSession","true")

...

 

DOES NOT WORK

 

 

 

New Contributor
Posts: 1
Registered: ‎04-13-2018

Re: Access spark temporary table via JDBC

I am using spark 2.0.2. Can you help me with build.sbt file. 

Announcements