- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
queries taking forever upgrading to hive-on-tez from MR
Created ‎05-12-2022 06:53 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hive from MR was upgraded to tez with the latest upgrade to cdp 7 and we are seeing significant performance drop. Tried running the same query with static single hour partition just to observe outcome and it took 5hrs to finish whereas it used to complete within 4-5 hrs across the same ORC data set for all 24hrs partition.
INSERT OVERWRITE TABLE `user_tables`.`dummy_table` PARTITION(date_partition, hour_partition)
SELECT `(date_partition|hour_partition)?+.+`, to_date(srt.date_time) as date_partition, SUBSTR(srt.date_time, 12, 2) AS hour_partition
FROM `user_tables`.`source_dummy_table` srt
WHERE srt.date_partition BETWEEN "2022-04-05" AND date_add("2022-04-05", 4)
AND upper(srt.prop1) = "XYZ"
AND to_date(srt.date_time) BETWEEN "2022-04-05" AND "2022-04-05";
DAG shows for above:
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
1 container KILLED 136366 9112 0 127254 3 1258
----------------------------------------------------------------------------------------------
VERTICES: 00/01 [=>>-------------------------] 6% ELAPSED TIME: 39377.91 s
We are using dynamic partitioning because this used to work fine on mr. What memory parameters can be tweaked for tez to make it work because this time line is unrealistic and its a relatively powerful cluster with 34 nodes with enough cores/memory (3TB).
Below settings already added as per suggestion:
hive.exec.compress.intermediate=true
hive.intermediate.compression.codec=org.apache.hadoop.io.compress.SnappyCodec
hive.intermediate.compression.type=BLOCK
hive.exec.parallel=true
hive.enforce.sorting=true
hive.exec.orc.split.strategy=BI
tez.grouping.max-size=67108864
tez.grouping.min-size=67108864
hive.merge.tezfiles=true
hive.merge.smallfiles.avgsize=67108864
hive.merge.size.per.task=134217728
tez.am.resource.memory.mb=16384
hive.tez.container.size=16384
Any help or suggestion is appreaciated.
Created ‎05-16-2022 11:59 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Djentlguy,
It is hard to say what is the actual issue we are hitting. We have to look into the Hive/App logs. Could you open a Cloudera support case to report this issue?
