Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive on Spark vs Impala

SOLVED Go to solution
Highlighted

Hive on Spark vs Impala

Champion Alumni

Hello,

 

In CDH 5.6 there is Hive on Spark and Impala.

How should we choose between these 2 services? Are there any benchmarks that compare these 2 services?

 

Thank you! :)

GHERMAN Alina
1 ACCEPTED SOLUTION

Accepted Solutions

Re: Hive on Spark vs Impala

Master Collaborator

Hi Alina,

 

Although Hive-on-Spark will definitely provide improved performance over MR for batch processing applications (eg ETL), that performance is not going to approach the interactive "BI" experience provided by Impala.

 

Here's some recent Impala performance testing results:

http://blog.cloudera.com/blog/2016/02/new-sql-benchmarks-apache-impala-incubating-2-3-uniquely-deliv...

 

Although Hive-on-Spark is not included, one would expect it to perform at levels similar to that of Hive-on-Tez (although having the added advantage of supporting consolidation onto the Spark API).

4 REPLIES 4

Re: Hive on Spark vs Impala

Master Collaborator

Hi Alina,

 

Although Hive-on-Spark will definitely provide improved performance over MR for batch processing applications (eg ETL), that performance is not going to approach the interactive "BI" experience provided by Impala.

 

Here's some recent Impala performance testing results:

http://blog.cloudera.com/blog/2016/02/new-sql-benchmarks-apache-impala-incubating-2-3-uniquely-deliv...

 

Although Hive-on-Spark is not included, one would expect it to perform at levels similar to that of Hive-on-Tez (although having the added advantage of supporting consolidation onto the Spark API).

Re: Hive on Spark vs Impala

New Contributor

What is cloudera's take on usage for Impala vs Hive-on-Spark?

We would also like to know what are the long term implications of introducing Hive-on-Spark vs Impala.

 

It would be definitely very interesting to have a head-to-head comparison between Impala, Hive on Spark and Stinger for example. I wouldnt include sparkSQL in here because in my opinion sparkSQL serves a totally different purpose.

 

/Izhar

Re: Hive on Spark vs Impala

Master Collaborator

Was there anything in my answers to these questions higher in the thread unclear?

Re: Hive on Spark vs Impala

New Contributor