Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

tuning HIVE query

tuning HIVE query

Master Collaborator

i am following this HortonWorks document to tune Hive query performance but as you see from my benchmark tests none of the methods are producing much different results , what does this mean ?

https://hortonworks.com/blog/5-ways-make-hive-queries-run-faster/

1) create ORC hive tables 
select count(*),ext_plaza_id from pa_lane_Txn group by ext_plaza_id;
Time taken: 28.732 seconds, Fetched: 408 row(s)
Time taken: 8.67 seconds, Fetched: 408 row(s)
Time taken: 14.01 seconds, Fetched: 408 row(s)
Time taken: 10.829 seconds, Fetched: 408 row(s)

2) set hive.execution.engine=tez;
select count(*),ext_plaza_id from pa_lane_Txn group by ext_plaza_id;
Time taken: 16.255 seconds, Fetched: 408 row(s)
Time taken: 9.829 seconds, Fetched: 408 row(s)
Time taken: 21.902 seconds, Fetched: 408 row(s)

3) set hive.vectorized.execution.enabled = true;
set hive.vectorized.execution.reduce.enabled = true;
select count(*),ext_plaza_id from pa_lane_Txn group by ext_plaza_id;
Time taken: 22.536 seconds, Fetched: 408 row(s)
Time taken: 8.378 seconds, Fetched: 408 row(s)
Time taken: 8.193 seconds, Fetched: 408 row(s)
Time taken: 9.943 seconds, Fetched: 408 row(s)

4) cost based query optimization
set hive.cbo.enable=true;
set hive.compute.query.using.stats=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
analyze table pa_lane_txn_orc compute statistics;
analyze table pa_lane_txn_orc compute statistics for columns [<column_name>];
select count(*),ext_plaza_id from pa_lane_Txn group by ext_plaza_id;
Time taken: 10.249 seconds, Fetched: 408 row(s)
Time taken: 14.466 seconds, Fetched: 408 row(s)
Time taken: 10.255 seconds, Fetched: 408 row(s)


1 REPLY 1
Highlighted

Re: tuning HIVE query

Hi @Sami Ahmad. It might be that your data set size isn't large enough to take advantage of the optimizations. What size are you tables? Also, instead of select count(*) you may try something more interesting such as a join or select specific columns.

Don't have an account?
Coming from Hortonworks? Activate your account here