Created on 08-17-2022 07:42 PM - edited 08-17-2022 07:57 PM
Hi all,
I am new to Hive, was told below parameter used to improve hive performance, if i were to set and run those code sequence as below, does the sequence matters and correlated to each other? Do we need to put up these code whenever run the query? Or execute once will be sufficient? Thanks.
Set hive.exec.parallel = true;
set hive.vectorized.execution.enabled = true;
set hive.vectorized.execution.reduce.enabled = true;
set hive.cbo.enable=true;
set hive.compute.query.using.stats=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
set mapred.compress.map.output = true;
set mapred.output.compress= true;
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
set hive.auto.convert.join = false;
set hive.exec.max.dynamic.partitions=100000;
set hive.exec.max.dynamic.partitions.pernode=10000;
Created 08-22-2022 02:35 PM
Hi,
I have some points to that questions:
I really can recommend you that article by a fellow Clouderan:
https://community.cloudera.com/t5/Community-Articles/Hive-on-Tez-Performance-Tuning-Determining-Redu...
If you have concrete questions to optimize a specific query do not hesitate to ask.
Created 08-22-2022 02:35 PM
Hi,
I have some points to that questions:
I really can recommend you that article by a fellow Clouderan:
https://community.cloudera.com/t5/Community-Articles/Hive-on-Tez-Performance-Tuning-Determining-Redu...
If you have concrete questions to optimize a specific query do not hesitate to ask.