Support Questions

Tomas79 · ‎03-06-2017

Hi,

I have found a general template how to access spark temporary data (id data frame) via an external tool using JDBC. What I have found that it should be quite simple:

1. Run spark-shell or submit spark job

2. Configure HiveContext and then run HiveThirftServer from the job.

In a separate session access the thrift server via beeline and query data.

Here is my code Spark 2.1:

import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
import org.apache.spark.sql.hive.thriftserver._

val sql = new HiveContext(sc)
sql.setConf("hive.server2.thrift.port", "10002")
sql.setConf("hive.server2.authentication","KERBEROS" )
sql.setConf("hive.server2.authentication.kerberos.principal","hive/host1.lab.hadoop.net@LAB.HADOOP.NET" )
sql.setConf("hive.server2.authentication.kerberos.keytab","/home/h.keytab" )
sql.setConf("spark.sql.hive.thriftServer.singleSession","true")
val data = sql.sql("select 112 as id")
data.collect
data.createOrReplaceTempView("yyy")
sql.sql("show tables").show

HiveThriftServer2.startWithContext(sql)     
 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException

Connect to the JDBC server:

beeline -u "jdbc:hive2://localhost:10002/default;principal=hive/host1.lab.hadoop.net@LAB.HADOOP.NET"

However when I try to launch the HiveThriftServer2 I can access the spark thrift but do not see the temporary table. Command "show tables" do not show any temporary table. Trying to query "yyy" throws an error:

scala> sql.sql("show tables").collect
res11: Array[org.apache.spark.sql.Row] = Array([,sometablename,true], [,yyy,true])

scala> 17/03/06 11:15:50 ERROR thriftserver.SparkExecuteStatementOperation: Error executing query, currentState RUNNING,
org.apache.spark.sql.AnalysisException: Table or view not found: yyy; line 1 pos 14
        at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:459)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:478)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:463)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:61)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:61)

If I create a table from beeline via "create table t as select 100 as id" the table is created and I can see it in spark-shell (data stored locally in spark-warehouse directory) So the other direction is working.

So the question what I am missing, why I can't see the temporary table?

Thanks

Tomas79 · ‎03-08-2017

I have found out what was the problem. The solution is to set the singleSession property to true in command line. Because setting it in a program seems to not work properly.

/bin/spark-shell --conf spark.sql.hive.thriftServer.singleSession=true

WORKS.

/bin/spark-shell

...

sql.setConf("spark.sql.hive.thriftServer.singleSession","true")

...

DOES NOT WORK

View solution in original post

Tomas79 · ‎03-08-2017

I have found out what was the problem. The solution is to set the singleSession property to true in command line. Because setting it in a program seems to not work properly.

/bin/spark-shell --conf spark.sql.hive.thriftServer.singleSession=true

WORKS.

/bin/spark-shell

...

sql.setConf("spark.sql.hive.thriftServer.singleSession","true")

...

DOES NOT WORK

LAzyDBA · ‎04-13-2018

I am using spark 2.0.2. Can you help me with build.sbt file.

Cloudera Community

Support Questions

Access spark temporary table via JDBC

Hive Temporary Tables.

Accessing Hbase tables and querying on Dataframes ...

Spark to read the Hive tables under information_sc...

Support for Hive DatabaseType in JDBC Storage Hand...

Accessing spark dataframe in spark-shell through J...

SparkSQL jdbc Federation

Spark SQL access on Hive table

Oozie Spark Action to access Hive using HiveContex...

Temporary table not found (ODBC driver issue ?)

Phoenix JDBC Client Setup