question Re: Hive on Spark or Impala in batch Process (ETL) in Support Questions

Hive on Spark or Impala in batch Process (ETL)

cetip — Fri, 16 Sep 2022 11:32:52 GMT

Hi All,

I have a doubt about performance and/or usable in batch process(ETL) between Impala or HoS.

I´ve read that impala is better in performance than HoS, but is not "best practice" (or not usual) to use in batch process (ETL).

Why? If it's the fastest, why dont use at all?

hugs,

Rodrigo Carvalho

Re: Hive on Spark or Impala in batch Process (ETL)

alex.behm — Thu, 04 May 2017 01:08:28 GMT

Some thoughts on your question:

- Hive is more flexible in terms of data formats that it can scan

- You may find Hive to be more feature rich in terms of SQL language support and built-in functions

- Hive will most likely complete your query even if there are node failures (this makes it suitable for long-running jobs); this is true for both Hive on MR and Hive on Spark

- If Impala can run your ETL, then it will probably be faster

- Impala will fail/abort a query if a node goes down during query execution

- The last point may make Impala less suitable for long-running jobs, but of course there is also a shorter failure window because queries are faster, so Impala may very well suit your ETL needs if you can tolerate the faiure behavior

You may also find this article interesting:

https://vision.cloudera.com/sql-on-apache-hadoop-choosing-the-right-tool-for-the-right-job/

Re: Hive on Spark or Impala in batch Process (ETL)

Henry2410 — Mon, 03 Aug 2020 17:00:32 GMT

Hive is more adaptable as far as data arranges that it can check

- You may see Hive as more component wealthy as far as SQL language support and inherent capacities

- Hive will probably finish your inquiry regardless of whether there are hub disappointments (this makes it reasonable for long-running employments); this is valid for both Hive on MR and Hive on Spark

- If Impala can run your ETL, at that point it will most likely be quicker

- Impala will come up short/prematurely end a question if a hub goes down during inquiry execution

- The last point may make Impala less reasonable for long-running occupations, obviously there is likewise a shorter disappointment window since questions are quicker, so Impala might just suit your ETL needs on the off chance that you can endure the faiure conduct