Support Questions
Find answers, ask questions, and share your expertise

HIve on Tez or HIve query using Spark SQL

Solved Go to solution

HIve on Tez or HIve query using Spark SQL

Contributor

Hi,

Can you please let me know which one is faster -Hive on Tez or accessing Hive using Spark SQL.

Thanks,

Chandra

1 ACCEPTED SOLUTION

Accepted Solutions

Re: HIve on Tez or HIve query using Spark SQL

@chandramouli muthukumaran

Just to clarify, SparlSQL does not access or use Hive engine. It just consumes the metadata of Hive data structures.

Assuming that both can execute the query functionally (SparkSQL is quite limited functionally compared with Hive), but the query will need to churn through 40 TB of data, then I would say likely Hive on Tez is your optimal choice. That is also driven by the cost associated with your Spark cluster RAM additional to Hive's requirements because I assume that you will still have some cases where running Hive is needed. I noticed that if the amount of data is less than 1 TB, SparkSQL outperforms Hive on Tez.

Anyhow, be aware, that with HDP 2.5 LLAP is in Tech Preview and soon will be GA. If you were asking Hive on LLAP vs. SparkSQL, I would say without hesitation for most of the queries, Hive on LLAP. Again, for some sofisticated queries with limited amount of data, and limited function, SparkSQL may be a winner, but in the big picture is too expensive to maintain both approaches and I would still consider Hive on Tez and LLAP over SparkSQL for most of the cases that deal with BIG DATA. Otherwise, 1 TB does not need Hadoop for fast queries.

Read more about Hive on LLAP here:

http://hortonworks.com/blog/llap-enables-sub-second-sql-hadoop/

Give LLAP a shot before deciding to use SparkSQL, especially, if you already have the queries written in HiveQL.

If this response or any response in this thread was helpful, please don't forget to vote/accept it as the best answer.

View solution in original post

3 REPLIES 3

Re: HIve on Tez or HIve query using Spark SQL

@chandramouli muthukumaran

Just to clarify, SparlSQL does not access or use Hive engine. It just consumes the metadata of Hive data structures.

Assuming that both can execute the query functionally (SparkSQL is quite limited functionally compared with Hive), but the query will need to churn through 40 TB of data, then I would say likely Hive on Tez is your optimal choice. That is also driven by the cost associated with your Spark cluster RAM additional to Hive's requirements because I assume that you will still have some cases where running Hive is needed. I noticed that if the amount of data is less than 1 TB, SparkSQL outperforms Hive on Tez.

Anyhow, be aware, that with HDP 2.5 LLAP is in Tech Preview and soon will be GA. If you were asking Hive on LLAP vs. SparkSQL, I would say without hesitation for most of the queries, Hive on LLAP. Again, for some sofisticated queries with limited amount of data, and limited function, SparkSQL may be a winner, but in the big picture is too expensive to maintain both approaches and I would still consider Hive on Tez and LLAP over SparkSQL for most of the cases that deal with BIG DATA. Otherwise, 1 TB does not need Hadoop for fast queries.

Read more about Hive on LLAP here:

http://hortonworks.com/blog/llap-enables-sub-second-sql-hadoop/

Give LLAP a shot before deciding to use SparkSQL, especially, if you already have the queries written in HiveQL.

If this response or any response in this thread was helpful, please don't forget to vote/accept it as the best answer.

View solution in original post

Re: HIve on Tez or HIve query using Spark SQL

Contributor

Thanks for your valuable information. So your recommendation is to go for Hive on LLAP rather than SparkSQL. Please correct me if I am wrong.

Re: HIve on Tez or HIve query using Spark SQL

Contributor

Also what is the need to run Hive queries on SparkSql when Hive on Tez can run much faster....