- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Hive on Spark vs Impala
Created on 03-07-2016 04:13 AM - edited 09-16-2022 03:07 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
In CDH 5.6 there is Hive on Spark and Impala.
How should we choose between these 2 services? Are there any benchmarks that compare these 2 services?
Thank you! 🙂
Created 03-07-2016 02:04 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Alina,
Although Hive-on-Spark will definitely provide improved performance over MR for batch processing applications (eg ETL), that performance is not going to approach the interactive "BI" experience provided by Impala.
Here's some recent Impala performance testing results:
http://blog.cloudera.com/blog/2016/02/new-sql-benchmarks-apache-impala-incubating-2-3-uniquely-deliv...
Although Hive-on-Spark is not included, one would expect it to perform at levels similar to that of Hive-on-Tez (although having the added advantage of supporting consolidation onto the Spark API).
Created 03-07-2016 02:04 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Alina,
Although Hive-on-Spark will definitely provide improved performance over MR for batch processing applications (eg ETL), that performance is not going to approach the interactive "BI" experience provided by Impala.
Here's some recent Impala performance testing results:
http://blog.cloudera.com/blog/2016/02/new-sql-benchmarks-apache-impala-incubating-2-3-uniquely-deliv...
Although Hive-on-Spark is not included, one would expect it to perform at levels similar to that of Hive-on-Tez (although having the added advantage of supporting consolidation onto the Spark API).
Created 04-18-2016 01:38 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What is cloudera's take on usage for Impala vs Hive-on-Spark?
We would also like to know what are the long term implications of introducing Hive-on-Spark vs Impala.
It would be definitely very interesting to have a head-to-head comparison between Impala, Hive on Spark and Stinger for example. I wouldnt include sparkSQL in here because in my opinion sparkSQL serves a totally different purpose.
/Izhar
Created 04-18-2016 11:17 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Was there anything in my answers to these questions higher in the thread unclear?
Created 05-16-2016 12:09 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
http://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Pros-and-Cons-of-fetching-data-usin...
