Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

queries taking forever upgrading to hive-on-tez from MR

avatar
New Contributor

Hive from MR was upgraded to tez with the latest upgrade to cdp 7 and we are seeing significant performance drop. Tried running the same query with static single hour partition just to observe outcome and it took 5hrs to finish whereas it used to complete within 4-5 hrs across the same ORC data set for all 24hrs partition.

 

INSERT OVERWRITE TABLE `user_tables`.`dummy_table` PARTITION(date_partition, hour_partition)
SELECT `(date_partition|hour_partition)?+.+`, to_date(srt.date_time) as date_partition, SUBSTR(srt.date_time, 12, 2) AS hour_partition
FROM `user_tables`.`source_dummy_table` srt
WHERE srt.date_partition BETWEEN "2022-04-05" AND date_add("2022-04-05", 4)
AND upper(srt.prop1) = "XYZ"
AND to_date(srt.date_time) BETWEEN "2022-04-05" AND "2022-04-05";


DAG shows for above: 
VERTICES MODE     STATUS     TOTAL   COMPLETED  RUNNING    PENDING   FAILED     KILLED
----------------------------------------------------------------------------------------------
1            container    KILLED    136366     9112        0                      127254       3            1258
----------------------------------------------------------------------------------------------
VERTICES: 00/01 [=>>-------------------------] 6% ELAPSED TIME: 39377.91 s


We are using dynamic partitioning because this used to work fine on mr. What memory parameters can be tweaked for tez to make it work because this time line is unrealistic and its a relatively powerful cluster with 34 nodes with enough cores/memory (3TB). 
Below settings already added as per suggestion:
hive.exec.compress.intermediate=true
hive.intermediate.compression.codec=org.apache.hadoop.io.compress.SnappyCodec

hive.intermediate.compression.type=BLOCK

hive.exec.parallel=true

hive.enforce.sorting=true

hive.exec.orc.split.strategy=BI

tez.grouping.max-size=67108864

tez.grouping.min-size=67108864

hive.merge.tezfiles=true

hive.merge.smallfiles.avgsize=67108864

hive.merge.size.per.task=134217728

tez.am.resource.memory.mb=16384

hive.tez.container.size=16384

Any help or suggestion is appreaciated. 

1 REPLY 1

avatar
Expert Contributor

Hi Djentlguy,

 

It is hard to say what is the actual issue we are hitting. We have to look into the Hive/App logs.  Could you open a Cloudera support case to report this issue?