Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark vs Tez?

Solved Go to solution

Spark vs Tez?

Whats the difference between the two?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Spark vs Tez?

There are many differences between the two.

Spark:

  • Spark provides API, execution engine and Packages (SQL, ML, Graph) on top of the core Spark API
  • Spark is application developer facing
  • Sparks abstractions are RDD/DataFrame & now DataSet (with Spark 1.6)

Tez

  • Tez is the execution engine for Hive & PIG

Bottom line, if are asking for the difference between Spark & Tez, consider using Spark.

4 REPLIES 4

Re: Spark vs Tez?

There are many differences between the two.

Spark:

  • Spark provides API, execution engine and Packages (SQL, ML, Graph) on top of the core Spark API
  • Spark is application developer facing
  • Sparks abstractions are RDD/DataFrame & now DataSet (with Spark 1.6)

Tez

  • Tez is the execution engine for Hive & PIG

Bottom line, if are asking for the difference between Spark & Tez, consider using Spark.

Re: Spark vs Tez?

Contributor

From what we have witnessed in the field and during some customers testing, SparkSQL (1.4.x) at the time of testing was generally 50% - %200 faster when querying small datasets, by small we mean anywhere < 100GB datasets, which is usually great for data discovery, data wrangling, testing stuff out, or even running a production usecase where the datasets tend to be a lot but relatively small.

the bigger the table especially when joins are not effectively used or we are scanning a single one big table, and if you are in the BI space, and SLAs are required and you cant afford a query to break and start over, Tez was able to shine, its rigid stable, and the bigger the datasets the better the performance gets compared to Spark, at a 250GB datasets you will see a lot of similarities on the execution time, of course this will depend on how big is the cluster, how much memory allocated..etc

in general, my personal opinion we shouldn't compare both at this time as both shine in seperate contexts, at some stage Tez might be needed but maybe more Spark would be required in smaller datasets, and as I mentioned that was based on Spark 1.4.x , would love to re-run the testings again especially after the new cube functionalities in Spark 1.5.

hope this was helpful.

Re: Spark vs Tez?

New Contributor

Spark is a framework and written in Scala, and richer support for Python and Java API's. Scala is based on

functional programming and easy for applications written in Scala.

Highlighted

Re: Spark vs Tez?

New Contributor

Spark is meant for application development. Tez is a library which is used by tools such as Hive to speed things up. Tez isn't suitable for end-user programming.