Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive Partition Write Performance

Hive Partition Write Performance

Contributor

Hello,

We are using HDP 2.2.9 which has Hive 0.14 version.

We are using Hive for ETL. We have a query which if run on AWS EMR takes half the time when compared to HDP cluster.

We have compared all the Hive properties between both the cluster and tried matching all of them. The capacity of HDP cluster is more than EMR cluster.

EMR uses MR and HDP uses Tez. Hence the processing is quiet fast in HDP. The reducer phase finishes quiet early but creation of the partitions takes huge time(almost double) when run usingh HDP's Hive. We are using orc format and zlib compression.

Are there any properties which affect the creation and write performance of partitions? How can we improvise and bring down the total time?

We have followed all the recommendations from many posts here including https://community.hortonworks.com/articles/22419/hive-on-tez-performance-tuning-determining-reducer.... but we are still not able to improve the performance.

1 REPLY 1
Highlighted

Re: Hive Partition Write Performance

Rising Star

> Are there any properties which affect the creation and write performance of partitions?

Yes. Compare the values of

set hive.optimize.sort.dynamic.partition;