<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Spark vs Tez? in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Choosing-Between-Spark-and-Tez-Use-Cases-for-Large-Datasets/m-p/98126#M254384</link>
    <description>&lt;P&gt;Spark is meant for application development. Tez is a library which is used by tools such as Hive to speed things up. Tez isn't suitable for end-user programming.&lt;/P&gt;</description>
    <pubDate>Mon, 14 Dec 2015 01:49:15 GMT</pubDate>
    <dc:creator>dkumar1</dc:creator>
    <dc:date>2015-12-14T01:49:15Z</dc:date>
    <item>
      <title>Choosing Between Spark and Tez: Use Cases for Large Datasets and Hive Integration</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Choosing-Between-Spark-and-Tez-Use-Cases-for-Large-Datasets/m-p/98122#M254380</link>
      <description>&lt;P&gt;Whats the difference between the two?&lt;/P&gt;</description>
      <pubDate>Tue, 12 May 2026 20:55:39 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Choosing-Between-Spark-and-Tez-Use-Cases-for-Large-Datasets/m-p/98122#M254380</guid>
      <dc:creator>abajwa</dc:creator>
      <dc:date>2026-05-12T20:55:39Z</dc:date>
    </item>
    <item>
      <title>Re: Spark vs Tez?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Choosing-Between-Spark-and-Tez-Use-Cases-for-Large-Datasets/m-p/98123#M254381</link>
      <description>&lt;P&gt;There are many differences between the two.&lt;/P&gt;&lt;P&gt;Spark:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Spark provides API, execution engine and Packages (SQL, ML, Graph) on
top of the core Spark API &lt;/LI&gt;&lt;LI&gt;Spark is application developer facing &lt;/LI&gt;&lt;LI&gt;Sparks abstractions are RDD/DataFrame &amp;amp; now DataSet (with Spark 1.6)&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Tez&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Tez is
the
execution engine for Hive &amp;amp; PIG&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Bottom line, if are asking for the difference between Spark &amp;amp; Tez, consider using Spark.&lt;/P&gt;</description>
      <pubDate>Tue, 08 Dec 2015 13:23:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Choosing-Between-Spark-and-Tez-Use-Cases-for-Large-Datasets/m-p/98123#M254381</guid>
      <dc:creator>vshukla</dc:creator>
      <dc:date>2015-12-08T13:23:12Z</dc:date>
    </item>
    <item>
      <title>Re: Spark vs Tez?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Choosing-Between-Spark-and-Tez-Use-Cases-for-Large-Datasets/m-p/98124#M254382</link>
      <description>&lt;P&gt;From what we have witnessed in the field and during some customers testing, SparkSQL (1.4.x) at the time of testing was generally 50% - %200  faster when querying small datasets, by small we mean anywhere &amp;lt; 100GB  datasets, which is usually great for data discovery, data wrangling, testing stuff out, or even running a production usecase where the datasets tend to be a lot but relatively small.&lt;/P&gt;&lt;P&gt; the bigger the table especially when joins are not effectively used or we are scanning a single one big table, and if you are in the BI space, and SLAs are required and you cant afford a query to break and start over, Tez was able to shine, its rigid stable, and the bigger the datasets the better the performance gets compared to Spark, at a 250GB datasets you will see a lot of similarities on the execution time, of course this will depend on how big is the cluster, how much memory allocated..etc&lt;/P&gt;&lt;P&gt;in general, my personal opinion we shouldn't compare both at this time as both shine in seperate contexts,  at some stage Tez might be needed but maybe more Spark would be required in smaller datasets, and as I mentioned that was based on Spark 1.4.x , would love to re-run the testings again especially after the new cube functionalities in Spark 1.5.&lt;/P&gt;&lt;P&gt;hope this was helpful.&lt;/P&gt;</description>
      <pubDate>Tue, 08 Dec 2015 13:26:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Choosing-Between-Spark-and-Tez-Use-Cases-for-Large-Datasets/m-p/98124#M254382</guid>
      <dc:creator>nshawa</dc:creator>
      <dc:date>2015-12-08T13:26:27Z</dc:date>
    </item>
    <item>
      <title>Re: Spark vs Tez?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Choosing-Between-Spark-and-Tez-Use-Cases-for-Large-Datasets/m-p/98125#M254383</link>
      <description>&lt;P&gt;Spark is a framework and written in Scala, and richer support for Python and Java API's. Scala is based on &lt;/P&gt;&lt;P&gt;functional programming and easy for applications written in Scala.&lt;/P&gt;</description>
      <pubDate>Wed, 09 Dec 2015 22:44:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Choosing-Between-Spark-and-Tez-Use-Cases-for-Large-Datasets/m-p/98125#M254383</guid>
      <dc:creator>anandmurari</dc:creator>
      <dc:date>2015-12-09T22:44:55Z</dc:date>
    </item>
    <item>
      <title>Re: Spark vs Tez?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Choosing-Between-Spark-and-Tez-Use-Cases-for-Large-Datasets/m-p/98126#M254384</link>
      <description>&lt;P&gt;Spark is meant for application development. Tez is a library which is used by tools such as Hive to speed things up. Tez isn't suitable for end-user programming.&lt;/P&gt;</description>
      <pubDate>Mon, 14 Dec 2015 01:49:15 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Choosing-Between-Spark-and-Tez-Use-Cases-for-Large-Datasets/m-p/98126#M254384</guid>
      <dc:creator>dkumar1</dc:creator>
      <dc:date>2015-12-14T01:49:15Z</dc:date>
    </item>
  </channel>
</rss>

