Support Questions

Find answers, ask questions, and share your expertise
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Improve Hive on Tez performance [HDP 2.6.4]

New Contributor



I was wondering how to improve / setup Tez in order to achieve performance I get when using Spark / Spark SQL.


Currently, I have a table that I need to scan and grab all the data matching certain column. The table is partitioned daily and I have ~100,000,000 rows per day. In Spark SQL, a simple spark.sql("select * from table where col=12345 limit 10000").show(false) finishes in 5-10 minutes, while Hive SQL Query (Hive on Tez) works over 20-30 minutes and I then break it. Also worth noting is that Hive SQL Query occupies pretty much 100% of the cluster, while Spark SQL only goes up to 50%. 


Cluster is currently running on ~4 TB on Yarn. I can provide more details, I just don't know exactly what to share at the moment.





Cloudera Employee

Hello @dandaran 


There is a great community post here - Demystifying Tez Memory Tuning

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.