Support Questions
Find answers, ask questions, and share your expertise

Degraded Performance when using Hive Transaction on Tez

Degraded Performance when using Hive Transaction on Tez

New Contributor

Hello Community,


We have a brand new cluster (HDP 3.1) of which we're looking to use the Hive Transactional capability that's built into Hive 3. While migrating processes from our old cluster (HDP 2.3) to this new one and incorporating Hive Transactional capabilities we noticed a substantial degradation in performance when compared to the old cluster. 

We disabled transactions by creating the source tables as External Non-Transactional tables and ran the same query and we got fairly comparable results so this tells us that the Hive Transactions are causing some slow down. 
After some analysis, we saw that data was being saved into the target table with only One reducer. Considering the job we're benchmarking at the moment is trying to save several Trillion records, it's safe to assume this is the bottleneck and everything is trying to be routed through that one reducer. After playing around with it, we found that adding bucketing and partitioning to the target table works to increase the number of reducers and thus improve performance. 
We did find the following document:
Which states:
You do not need to bucket ACID tables, so maintenance is easier. You no longer need to perform ACID delete operations in a Hive table.
Can someone comment as to the accuracy of this document and confirm if pre-Bucketing/partitioning Hive tables are required with Hive 3?