Created on 03-06-2017 04:27 AM - edited 09-16-2022 04:11 AM
Hi,
I have found a general template how to access spark temporary data (id data frame) via an external tool using JDBC. What I have found that it should be quite simple:
1. Run spark-shell or submit spark job
2. Configure HiveContext and then run HiveThirftServer from the job.
In a separate session access the thrift server via beeline and query data.
Here is my code Spark 2.1:
import org.apache.spark.sql.hive.HiveContext import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 import org.apache.spark.sql.hive.thriftserver._ val sql = new HiveContext(sc) sql.setConf("hive.server2.thrift.port", "10002") sql.setConf("hive.server2.authentication","KERBEROS" ) sql.setConf("hive.server2.authentication.kerberos.principal","hive/host1.lab.hadoop.net@LAB.HADOOP.NET" ) sql.setConf("hive.server2.authentication.kerberos.keytab","/home/h.keytab" ) sql.setConf("spark.sql.hive.thriftServer.singleSession","true") val data = sql.sql("select 112 as id") data.collect data.createOrReplaceTempView("yyy") sql.sql("show tables").show HiveThriftServer2.startWithContext(sql) WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
Connect to the JDBC server:
beeline -u "jdbc:hive2://localhost:10002/default;principal=hive/host1.lab.hadoop.net@LAB.HADOOP.NET"
However when I try to launch the HiveThriftServer2 I can access the spark thrift but do not see the temporary table. Command "show tables" do not show any temporary table. Trying to query "yyy" throws an error:
scala> sql.sql("show tables").collect res11: Array[org.apache.spark.sql.Row] = Array([,sometablename,true], [,yyy,true]) scala> 17/03/06 11:15:50 ERROR thriftserver.SparkExecuteStatementOperation: Error executing query, currentState RUNNING, org.apache.spark.sql.AnalysisException: Table or view not found: yyy; line 1 pos 14 at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:459) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:478) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:463) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:61) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:61)
If I create a table from beeline via "create table t as select 100 as id" the table is created and I can see it in spark-shell (data stored locally in spark-warehouse directory) So the other direction is working.
So the question what I am missing, why I can't see the temporary table?
Thanks
Created 03-08-2017 03:52 AM
I have found out what was the problem. The solution is to set the singleSession property to true in command line. Because setting it in a program seems to not work properly.
/bin/spark-shell --conf spark.sql.hive.thriftServer.singleSession=true
WORKS.
/bin/spark-shell
...
sql.setConf("spark.sql.hive.thriftServer.singleSession","true")
...
DOES NOT WORK
Created 03-08-2017 03:52 AM
I have found out what was the problem. The solution is to set the singleSession property to true in command line. Because setting it in a program seems to not work properly.
/bin/spark-shell --conf spark.sql.hive.thriftServer.singleSession=true
WORKS.
/bin/spark-shell
...
sql.setConf("spark.sql.hive.thriftServer.singleSession","true")
...
DOES NOT WORK
Created 04-13-2018 04:04 PM
I am using spark 2.0.2. Can you help me with build.sbt file.