Posts: 430
Registered: ‎07-01-2015
Accepted Solution

Access spark temporary table via JDBC


 I have found a general template how to access spark temporary data (id data frame) via an external tool using JDBC. What I have found that it should be quite simple:

1. Run spark-shell or submit spark job

2. Configure HiveContext and then run HiveThirftServer from the job.


In a separate session access the thrift server via beeline and query data.


Here is my code Spark 2.1:


import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
import org.apache.spark.sql.hive.thriftserver._

val sql = new HiveContext(sc)
sql.setConf("hive.server2.thrift.port", "10002")
sql.setConf("hive.server2.authentication","KERBEROS" )
sql.setConf("hive.server2.authentication.kerberos.principal","hive/" )
sql.setConf("hive.server2.authentication.kerberos.keytab","/home/h.keytab" )
val data = sql.sql("select 112 as id")
sql.sql("show tables").show

 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException     

Connect to the JDBC server:

beeline -u "jdbc:hive2://localhost:10002/default;principal=hive/"


However when I try to launch the HiveThriftServer2 I can access the spark thrift but do not see the temporary table. Command "show tables" do not show any temporary table. Trying to query "yyy" throws an error:


scala> sql.sql("show tables").collect
res11: Array[org.apache.spark.sql.Row] = Array([,sometablename,true], [,yyy,true])

scala> 17/03/06 11:15:50 ERROR thriftserver.SparkExecuteStatementOperation: Error executing query, currentState RUNNING,
org.apache.spark.sql.AnalysisException: Table or view not found: yyy; line 1 pos 14
        at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:459)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:478)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:463)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:61)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:61)


If I create a table from beeline via "create table t as select 100 as id" the table is created and I can see it in spark-shell (data stored locally in spark-warehouse directory) So the other direction is working.


So the question what I am missing, why I can't see the temporary table?



Posts: 430
Registered: ‎07-01-2015

Re: Access spark temporary table via JDBC

I have found out what was the problem. The solution is to set the singleSession property to true in command line. Because setting it in a program seems to not work properly.


/bin/spark-shell --conf spark.sql.hive.thriftServer.singleSession=true













New Contributor
Posts: 1
Registered: ‎04-13-2018

Re: Access spark temporary table via JDBC

I am using spark 2.0.2. Can you help me with build.sbt file.