<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: HIve on Tez or HIve query using Spark SQL in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/HIve-on-Tez-or-HIve-query-using-Spark-SQL/m-p/133997#M39603</link>
    <description>&lt;P&gt;@&lt;A href="https://community.hortonworks.com/users/11008/chandramoulimuthukumaran.html"&gt;chandramouli muthukumaran&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Just to clarify, SparlSQL does not access or use Hive engine. It just consumes the metadata of Hive data structures.&lt;/P&gt;&lt;P&gt; Assuming that both can execute the query functionally (SparkSQL is quite limited functionally compared with Hive), but the query will need to churn through 40 TB of data, then I would say likely Hive on Tez is your optimal choice. That is also driven by the cost associated with your Spark cluster RAM additional to Hive's requirements because I assume that you will still have some cases where running Hive is needed. I noticed that if the amount of data is less than 1 TB, SparkSQL outperforms Hive on Tez. &lt;/P&gt;&lt;P&gt;Anyhow, be aware, that with HDP 2.5 LLAP is in Tech Preview and soon will be GA. If you were asking Hive on LLAP vs. SparkSQL, I would say without hesitation for  most of the queries, Hive on LLAP. Again, for some sofisticated queries with limited amount of data, and limited function, SparkSQL may be a winner, but in the big picture is too expensive to maintain both approaches and I would still consider Hive on Tez and LLAP over SparkSQL for most of the cases that deal with BIG DATA. Otherwise, 1 TB does not need Hadoop for fast queries.&lt;/P&gt;&lt;P&gt;Read more about Hive on LLAP here:&lt;/P&gt;&lt;P&gt;&lt;A href="http://hortonworks.com/blog/llap-enables-sub-second-sql-hadoop/" target="_blank"&gt;http://hortonworks.com/blog/llap-enables-sub-second-sql-hadoop/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Give LLAP a shot before deciding to use SparkSQL, especially, if you already have the queries written in HiveQL.&lt;/P&gt;&lt;P&gt;If this response or any response in this thread was helpful, please don't forget to vote/accept it as the best answer.&lt;/P&gt;</description>
    <pubDate>Sat, 03 Sep 2016 03:07:34 GMT</pubDate>
    <dc:creator>cstanca</dc:creator>
    <dc:date>2016-09-03T03:07:34Z</dc:date>
    <item>
      <title>HIve on Tez or HIve query using Spark SQL</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/HIve-on-Tez-or-HIve-query-using-Spark-SQL/m-p/133996#M39602</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;Can you please let me know which one is faster -Hive on Tez or accessing Hive using Spark SQL. &lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Chandra&lt;/P&gt;</description>
      <pubDate>Sat, 03 Sep 2016 02:44:01 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/HIve-on-Tez-or-HIve-query-using-Spark-SQL/m-p/133996#M39602</guid>
      <dc:creator>Chandra</dc:creator>
      <dc:date>2016-09-03T02:44:01Z</dc:date>
    </item>
    <item>
      <title>Re: HIve on Tez or HIve query using Spark SQL</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/HIve-on-Tez-or-HIve-query-using-Spark-SQL/m-p/133997#M39603</link>
      <description>&lt;P&gt;@&lt;A href="https://community.hortonworks.com/users/11008/chandramoulimuthukumaran.html"&gt;chandramouli muthukumaran&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Just to clarify, SparlSQL does not access or use Hive engine. It just consumes the metadata of Hive data structures.&lt;/P&gt;&lt;P&gt; Assuming that both can execute the query functionally (SparkSQL is quite limited functionally compared with Hive), but the query will need to churn through 40 TB of data, then I would say likely Hive on Tez is your optimal choice. That is also driven by the cost associated with your Spark cluster RAM additional to Hive's requirements because I assume that you will still have some cases where running Hive is needed. I noticed that if the amount of data is less than 1 TB, SparkSQL outperforms Hive on Tez. &lt;/P&gt;&lt;P&gt;Anyhow, be aware, that with HDP 2.5 LLAP is in Tech Preview and soon will be GA. If you were asking Hive on LLAP vs. SparkSQL, I would say without hesitation for  most of the queries, Hive on LLAP. Again, for some sofisticated queries with limited amount of data, and limited function, SparkSQL may be a winner, but in the big picture is too expensive to maintain both approaches and I would still consider Hive on Tez and LLAP over SparkSQL for most of the cases that deal with BIG DATA. Otherwise, 1 TB does not need Hadoop for fast queries.&lt;/P&gt;&lt;P&gt;Read more about Hive on LLAP here:&lt;/P&gt;&lt;P&gt;&lt;A href="http://hortonworks.com/blog/llap-enables-sub-second-sql-hadoop/" target="_blank"&gt;http://hortonworks.com/blog/llap-enables-sub-second-sql-hadoop/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Give LLAP a shot before deciding to use SparkSQL, especially, if you already have the queries written in HiveQL.&lt;/P&gt;&lt;P&gt;If this response or any response in this thread was helpful, please don't forget to vote/accept it as the best answer.&lt;/P&gt;</description>
      <pubDate>Sat, 03 Sep 2016 03:07:34 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/HIve-on-Tez-or-HIve-query-using-Spark-SQL/m-p/133997#M39603</guid>
      <dc:creator>cstanca</dc:creator>
      <dc:date>2016-09-03T03:07:34Z</dc:date>
    </item>
    <item>
      <title>Re: HIve on Tez or HIve query using Spark SQL</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/HIve-on-Tez-or-HIve-query-using-Spark-SQL/m-p/133998#M39604</link>
      <description>&lt;P&gt;Thanks for your valuable information. So your recommendation is to go for Hive on LLAP rather than SparkSQL. Please correct me if I am wrong. &lt;/P&gt;</description>
      <pubDate>Sat, 03 Sep 2016 03:16:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/HIve-on-Tez-or-HIve-query-using-Spark-SQL/m-p/133998#M39604</guid>
      <dc:creator>Chandra</dc:creator>
      <dc:date>2016-09-03T03:16:55Z</dc:date>
    </item>
    <item>
      <title>Re: HIve on Tez or HIve query using Spark SQL</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/HIve-on-Tez-or-HIve-query-using-Spark-SQL/m-p/133999#M39605</link>
      <description>&lt;P&gt;Also what is the need to run Hive queries on SparkSql when Hive on Tez can run much faster....&lt;/P&gt;</description>
      <pubDate>Sat, 03 Sep 2016 04:19:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/HIve-on-Tez-or-HIve-query-using-Spark-SQL/m-p/133999#M39605</guid>
      <dc:creator>Chandra</dc:creator>
      <dc:date>2016-09-03T04:19:45Z</dc:date>
    </item>
  </channel>
</rss>

