Created on 03-07-2016 04:13 AM - edited 09-16-2022 03:07 AM
Hello,
In CDH 5.6 there is Hive on Spark and Impala.
How should we choose between these 2 services? Are there any benchmarks that compare these 2 services?
Thank you! 🙂
Created 03-07-2016 02:04 PM
Hi Alina,
Although Hive-on-Spark will definitely provide improved performance over MR for batch processing applications (eg ETL), that performance is not going to approach the interactive "BI" experience provided by Impala.
Here's some recent Impala performance testing results:
http://blog.cloudera.com/blog/2016/02/new-sql-benchmarks-apache-impala-incubating-2-3-uniquely-deliv...
Although Hive-on-Spark is not included, one would expect it to perform at levels similar to that of Hive-on-Tez (although having the added advantage of supporting consolidation onto the Spark API).
Created 03-07-2016 02:04 PM
Hi Alina,
Although Hive-on-Spark will definitely provide improved performance over MR for batch processing applications (eg ETL), that performance is not going to approach the interactive "BI" experience provided by Impala.
Here's some recent Impala performance testing results:
http://blog.cloudera.com/blog/2016/02/new-sql-benchmarks-apache-impala-incubating-2-3-uniquely-deliv...
Although Hive-on-Spark is not included, one would expect it to perform at levels similar to that of Hive-on-Tez (although having the added advantage of supporting consolidation onto the Spark API).
Created 04-18-2016 01:38 AM
What is cloudera's take on usage for Impala vs Hive-on-Spark?
We would also like to know what are the long term implications of introducing Hive-on-Spark vs Impala.
It would be definitely very interesting to have a head-to-head comparison between Impala, Hive on Spark and Stinger for example. I wouldnt include sparkSQL in here because in my opinion sparkSQL serves a totally different purpose.
/Izhar
Created 04-18-2016 11:17 AM
Was there anything in my answers to these questions higher in the thread unclear?
Created 05-16-2016 12:09 AM