Support Questions
Find answers, ask questions, and share your expertise

Does Impala substitute Hive?

Does Impala substitute Hive?

New Contributor

Hello everyone

 

I'm looking for use cases where it is not possible to use Impala, but it is possible to use Hive. Are there any?

 

Thanks in advance

1 REPLY 1

Re: Does Impala substitute Hive?

Cloudera Employee

"Not possible to use Impala" mainly comes down to unsupported syntax or data types, summarized here:

 

http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Im...

 

"Not possible" is maybe overstating the case, it's usually a question of which is more practical. Even if Impala currently doesn't support statement ABC or data type XYZ, that specific limitation might not be true forever. The decision might depend on your timeframe, or how much you have invested already in a particular approach. For example, someone just starting on a project might try out different approaches involving nested data types in Hive, a flattened version of the same schema in Impala, or some combination of Impala + HBase, and decide which approach is best based on performance + amount of coding work.

 

You might prefer Hive for insert jobs that run over a period of multiple days, to avoid having to re-run an operation if it failed partway through. After the data is all loaded, you could still use Impala to query it.  The FAQ entry for this question mainly focuses on the term "long-running":

 

http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Cloudera-Impala-Frequen...

 

I believe some people use Hive also for queries that run over several days, where the same consideration would apply. Although still you would want to benchmark both systems, because if the equivalent Impala query took substantially less time, perhaps it would not be a big deal to re-run in case of node failure etc.