- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Can I use SparkSQL on a cluster using Hive on Spark?
- Labels:
-
Apache Hive
-
Apache Spark
-
Apache Zeppelin
Created on ‎11-05-2019 10:32 PM - edited ‎11-05-2019 10:36 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am using CDH 6.1.1 Cluster.
Cluster is configured to use Spark as the execution engine for Hive.
Is there anything wrong with using SparkSQL on this Cluster?
Is it ok to create Hive Tables and change data using SparkSQL?
Since SparkSQL uses the Hive Metastore, I suspect that there may be a conflict between SparkSQL and Hive on Spark.
In addition, please refer to documentation on how to intergrate Cloudera CDH Hive with Apache Zeppelin's Spark interpreter.
Thank you.
Created ‎11-14-2019 02:54 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey @avengers,
Just thought, this could add some more value to this question here.
Spark SQL uses a Hive Metastore to manage the metadata of persistent relational entities (e.g. databases, tables, columns, partitions) in a relational database (for fast access) [1].
Also, I don't think there would be a MetaStore crash if we use it along with HiveOnSpark.
[1] https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-hive-metastore.html
Created ‎11-06-2019 02:08 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @av ,
Here the links for the Hive and Spark interpreter doc's :
https://zeppelin.apache.org/docs/0.8.2/interpreter/hive.html
https://zeppelin.apache.org/docs/0.8.2/interpreter/spark.html
Best,
Helmi KHALIFA
Created on ‎11-06-2019 10:13 PM - edited ‎11-06-2019 10:19 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks. However, I have already read them.
I'am already connecting to Hive from Zeppelin using JDBC.
I want to query Hive Table with SparkSQL.
And I'm wondering if the metastore won't crash if I use it in a Cluster using HiveOnSpark.
For example.
%spark
val df = spark.read.format("csv").option("header", "true")
.option("inferSchema", "true").load("/somefile.csv")
df.createOrReplaceTempView("csvTable");
%spark.sql
select *
from csvTable lt
join hiveTable rt
on lt.col = rt.col
Created on ‎11-08-2019 08:42 AM - edited ‎11-08-2019 08:55 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @avengers ,
U will need to share variables between two zeppelin interpreters and i dont think that we can do it between spark and sparkSQL.
I find an easier way by using sqlContext inside the same interpreter %spark:
%spark
val df = spark.read.format("csv").option("header", "true")
.option("inferSchema", "true").load("/somefile.csv")
df.createOrReplaceTempView("csvTable");
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
val resultat = sqlContext.sql("select * from csvTable lt join hiveTable rt on lt.col = rt.col")
resultat.show()
I tried it and it works !
Best,
Helmi KHALIFA
Created ‎11-14-2019 02:54 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey @avengers,
Just thought, this could add some more value to this question here.
Spark SQL uses a Hive Metastore to manage the metadata of persistent relational entities (e.g. databases, tables, columns, partitions) in a relational database (for fast access) [1].
Also, I don't think there would be a MetaStore crash if we use it along with HiveOnSpark.
[1] https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-hive-metastore.html
Created ‎11-14-2019 02:42 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hi @avengers
If it works for you, would you be kind enough to accept the answer please ?
Best,
Helmi KHALIFA
