We are using HDP 2.2.9 which has Hive 0.14 version.
We are using Hive for ETL. We have a query which if run on AWS EMR takes half the time when compared to HDP cluster.
We have compared all the Hive properties between both the cluster and tried matching all of them. The capacity of HDP cluster is more than EMR cluster.
EMR uses MR and HDP uses Tez. Hence the processing is quiet fast in HDP. The reducer phase finishes quiet early but creation of the partitions takes huge time(almost double) when run usingh HDP's Hive. We are using orc format and zlib compression.
Are there any properties which affect the creation and write performance of partitions? How can we improvise and bring down the total time?