- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Using RDD in hive
- Labels:
-
Apache Hadoop
-
Apache Hive
-
Apache Spark
Created ‎03-18-2017 06:57 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Can we use RDD cache in hive?
Like can I create a dataframe which picks data from a hive table. Then create an external hive table in top the data frame which is in the cache? Is it compatible? Does enabling execution engine as 'Spark' in hive will allow me to use RDD cache? My question might be silly but still i wanted to know whether it is really possible as I have less knowledge in spark. If possible throw some light on how I can make use of RDD cache in hive.
Created ‎03-18-2017 09:05 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎03-18-2017 09:05 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Below is a good read:
https://hortonworks.com/hadoop-tutorial/using-hive-with-orc-from-apache-spark/
Created ‎03-27-2017 10:07 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Bala Vignesh N V Specifying the hive.execution.engine to spark will result in kicking off Spark jobs for the SQL query. But that's not supported by Hortonworks. The better way is to use Spark thrift server plus beeline to run queries:
You can create hive tables, execute a query (by submitting a spark job under the hood) and the query result set is generated based on SparkContext. Is that what you need?
