Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Access spark temporary table via JDBC

SOLVED Go to solution

Access spark temporary table via JDBC

Master Collaborator

Hi,

 I have found a general template how to access spark temporary data (id data frame) via an external tool using JDBC. What I have found that it should be quite simple:

1. Run spark-shell or submit spark job

2. Configure HiveContext and then run HiveThirftServer from the job.

 

In a separate session access the thrift server via beeline and query data.

 

Here is my code Spark 2.1:

 

import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
import org.apache.spark.sql.hive.thriftserver._

val sql = new HiveContext(sc)
sql.setConf("hive.server2.thrift.port", "10002")
sql.setConf("hive.server2.authentication","KERBEROS" )
sql.setConf("hive.server2.authentication.kerberos.principal","hive/host1.lab.hadoop.net@LAB.HADOOP.NET" )
sql.setConf("hive.server2.authentication.kerberos.keytab","/home/h.keytab" )
sql.setConf("spark.sql.hive.thriftServer.singleSession","true")
val data = sql.sql("select 112 as id")
data.collect
data.createOrReplaceTempView("yyy")
sql.sql("show tables").show

HiveThriftServer2.startWithContext(sql)     
 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException     

Connect to the JDBC server:

beeline -u "jdbc:hive2://localhost:10002/default;principal=hive/host1.lab.hadoop.net@LAB.HADOOP.NET"

 

However when I try to launch the HiveThriftServer2 I can access the spark thrift but do not see the temporary table. Command "show tables" do not show any temporary table. Trying to query "yyy" throws an error:

 

scala> sql.sql("show tables").collect
res11: Array[org.apache.spark.sql.Row] = Array([,sometablename,true], [,yyy,true])

scala> 17/03/06 11:15:50 ERROR thriftserver.SparkExecuteStatementOperation: Error executing query, currentState RUNNING,
org.apache.spark.sql.AnalysisException: Table or view not found: yyy; line 1 pos 14
        at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:459)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:478)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:463)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:61)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:61)

 

If I create a table from beeline via "create table t as select 100 as id" the table is created and I can see it in spark-shell (data stored locally in spark-warehouse directory) So the other direction is working.

 

So the question what I am missing, why I can't see the temporary table?

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Access spark temporary table via JDBC

Master Collaborator

I have found out what was the problem. The solution is to set the singleSession property to true in command line. Because setting it in a program seems to not work properly.

 

/bin/spark-shell --conf spark.sql.hive.thriftServer.singleSession=true

 

WORKS.

 

/bin/spark-shell

...

sql.setConf("spark.sql.hive.thriftServer.singleSession","true")

...

 

DOES NOT WORK

 

 

 

2 REPLIES 2

Re: Access spark temporary table via JDBC

Master Collaborator

I have found out what was the problem. The solution is to set the singleSession property to true in command line. Because setting it in a program seems to not work properly.

 

/bin/spark-shell --conf spark.sql.hive.thriftServer.singleSession=true

 

WORKS.

 

/bin/spark-shell

...

sql.setConf("spark.sql.hive.thriftServer.singleSession","true")

...

 

DOES NOT WORK

 

 

 

Highlighted

Re: Access spark temporary table via JDBC

New Contributor

I am using spark 2.0.2. Can you help me with build.sbt file.