Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Access spark temporary table via JDBC

Master Collaborator

Hi,

 I have found a general template how to access spark temporary data (id data frame) via an external tool using JDBC. What I have found that it should be quite simple:

1. Run spark-shell or submit spark job

2. Configure HiveContext and then run HiveThirftServer from the job.

 

In a separate session access the thrift server via beeline and query data.

 

Here is my code Spark 2.1:

 

import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
import org.apache.spark.sql.hive.thriftserver._

val sql = new HiveContext(sc)
sql.setConf("hive.server2.thrift.port", "10002")
sql.setConf("hive.server2.authentication","KERBEROS" )
sql.setConf("hive.server2.authentication.kerberos.principal","hive/host1.lab.hadoop.net@LAB.HADOOP.NET" )
sql.setConf("hive.server2.authentication.kerberos.keytab","/home/h.keytab" )
sql.setConf("spark.sql.hive.thriftServer.singleSession","true")
val data = sql.sql("select 112 as id")
data.collect
data.createOrReplaceTempView("yyy")
sql.sql("show tables").show

HiveThriftServer2.startWithContext(sql)     
 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException     

Connect to the JDBC server:

beeline -u "jdbc:hive2://localhost:10002/default;principal=hive/host1.lab.hadoop.net@LAB.HADOOP.NET"

 

However when I try to launch the HiveThriftServer2 I can access the spark thrift but do not see the temporary table. Command "show tables" do not show any temporary table. Trying to query "yyy" throws an error:

 

scala> sql.sql("show tables").collect
res11: Array[org.apache.spark.sql.Row] = Array([,sometablename,true], [,yyy,true])

scala> 17/03/06 11:15:50 ERROR thriftserver.SparkExecuteStatementOperation: Error executing query, currentState RUNNING,
org.apache.spark.sql.AnalysisException: Table or view not found: yyy; line 1 pos 14
        at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:459)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:478)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:463)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:61)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:61)

 

If I create a table from beeline via "create table t as select 100 as id" the table is created and I can see it in spark-shell (data stored locally in spark-warehouse directory) So the other direction is working.

 

So the question what I am missing, why I can't see the temporary table?

 

Thanks

1 ACCEPTED SOLUTION

Master Collaborator

I have found out what was the problem. The solution is to set the singleSession property to true in command line. Because setting it in a program seems to not work properly.

 

/bin/spark-shell --conf spark.sql.hive.thriftServer.singleSession=true

 

WORKS.

 

/bin/spark-shell

...

sql.setConf("spark.sql.hive.thriftServer.singleSession","true")

...

 

DOES NOT WORK

 

 

 

View solution in original post

2 REPLIES 2

Master Collaborator

I have found out what was the problem. The solution is to set the singleSession property to true in command line. Because setting it in a program seems to not work properly.

 

/bin/spark-shell --conf spark.sql.hive.thriftServer.singleSession=true

 

WORKS.

 

/bin/spark-shell

...

sql.setConf("spark.sql.hive.thriftServer.singleSession","true")

...

 

DOES NOT WORK

 

 

 

New Contributor

I am using spark 2.0.2. Can you help me with build.sbt file. 

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.