Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Spark vs Tez?

avatar

Whats the difference between the two?

1 ACCEPTED SOLUTION

avatar

There are many differences between the two.

Spark:

  • Spark provides API, execution engine and Packages (SQL, ML, Graph) on top of the core Spark API
  • Spark is application developer facing
  • Sparks abstractions are RDD/DataFrame & now DataSet (with Spark 1.6)

Tez

  • Tez is the execution engine for Hive & PIG

Bottom line, if are asking for the difference between Spark & Tez, consider using Spark.

View solution in original post

4 REPLIES 4

avatar

There are many differences between the two.

Spark:

  • Spark provides API, execution engine and Packages (SQL, ML, Graph) on top of the core Spark API
  • Spark is application developer facing
  • Sparks abstractions are RDD/DataFrame & now DataSet (with Spark 1.6)

Tez

  • Tez is the execution engine for Hive & PIG

Bottom line, if are asking for the difference between Spark & Tez, consider using Spark.

avatar
Rising Star

From what we have witnessed in the field and during some customers testing, SparkSQL (1.4.x) at the time of testing was generally 50% - %200 faster when querying small datasets, by small we mean anywhere < 100GB datasets, which is usually great for data discovery, data wrangling, testing stuff out, or even running a production usecase where the datasets tend to be a lot but relatively small.

the bigger the table especially when joins are not effectively used or we are scanning a single one big table, and if you are in the BI space, and SLAs are required and you cant afford a query to break and start over, Tez was able to shine, its rigid stable, and the bigger the datasets the better the performance gets compared to Spark, at a 250GB datasets you will see a lot of similarities on the execution time, of course this will depend on how big is the cluster, how much memory allocated..etc

in general, my personal opinion we shouldn't compare both at this time as both shine in seperate contexts, at some stage Tez might be needed but maybe more Spark would be required in smaller datasets, and as I mentioned that was based on Spark 1.4.x , would love to re-run the testings again especially after the new cube functionalities in Spark 1.5.

hope this was helpful.

avatar
Explorer

Spark is a framework and written in Scala, and richer support for Python and Java API's. Scala is based on

functional programming and easy for applications written in Scala.

avatar
Contributor

Spark is meant for application development. Tez is a library which is used by tools such as Hive to speed things up. Tez isn't suitable for end-user programming.