Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive on Spark vs Impala

avatar
Champion Alumni

Hello,

 

In CDH 5.6 there is Hive on Spark and Impala.

How should we choose between these 2 services? Are there any benchmarks that compare these 2 services?

 

Thank you! 🙂

GHERMAN Alina
1 ACCEPTED SOLUTION

avatar
Master Collaborator

Hi Alina,

 

Although Hive-on-Spark will definitely provide improved performance over MR for batch processing applications (eg ETL), that performance is not going to approach the interactive "BI" experience provided by Impala.

 

Here's some recent Impala performance testing results:

http://blog.cloudera.com/blog/2016/02/new-sql-benchmarks-apache-impala-incubating-2-3-uniquely-deliv...

 

Although Hive-on-Spark is not included, one would expect it to perform at levels similar to that of Hive-on-Tez (although having the added advantage of supporting consolidation onto the Spark API).

View solution in original post

4 REPLIES 4

avatar
Master Collaborator

Hi Alina,

 

Although Hive-on-Spark will definitely provide improved performance over MR for batch processing applications (eg ETL), that performance is not going to approach the interactive "BI" experience provided by Impala.

 

Here's some recent Impala performance testing results:

http://blog.cloudera.com/blog/2016/02/new-sql-benchmarks-apache-impala-incubating-2-3-uniquely-deliv...

 

Although Hive-on-Spark is not included, one would expect it to perform at levels similar to that of Hive-on-Tez (although having the added advantage of supporting consolidation onto the Spark API).

avatar
New Contributor

What is cloudera's take on usage for Impala vs Hive-on-Spark?

We would also like to know what are the long term implications of introducing Hive-on-Spark vs Impala.

 

It would be definitely very interesting to have a head-to-head comparison between Impala, Hive on Spark and Stinger for example. I wouldnt include sparkSQL in here because in my opinion sparkSQL serves a totally different purpose.

 

/Izhar

avatar
Master Collaborator

Was there anything in my answers to these questions higher in the thread unclear?

avatar
Explorer