Reply
Highlighted
Explorer
Posts: 18
Registered: ‎03-19-2019

Increase IMPALA query performance/optimization based on Impala Query Attribute.

Hi Friends I am trying to provide recommendations for improving peformance of IMPALA query based on query execution.
After executing the Impala query we get list of query_attirbutes and based on the values of query_attributes I want to give recommendations. Because I don't find any other way by which I can give tips/recommendations to user for improving the performance of query. If you know any other way/approach then please let me know -->
I tried some of the IMPALA "Query Option" but they are not sufficient to improve the peformance.


I want to "REDUCE" values of some of the "query attributes" so that I can say query execution is optimized. 

Below are the query attributes whose value I want to "REDUCE",
Can you please suggest what changes/arrangement I should make so that values of each attribute "REDUCE" or lowered

"thread_storage_wait_time" -- How to "REDUCE" its value?


"stats_missing" --- If false then Perform statics before executing the query

"thread_total_time" How to "REDUCE" its value? Is there any way to "REDUCE" this "time"?
"thread_network_send_wait_time_percentage" ---- How to "REDUCE" its value?
"thread_network_receive_wait_time_percentage" --- How to "REDUCE" its value?
"hdfs_bytes_read_from_cache" --- If it is 0 then Enable hdfs cache.
"hdfs_bytes_read_local_percentage" ---
"hdfs_scanner_average_bytes_read_per_second" -- Can we INCREASE its value. Which option we should use?

"bytes_streamed" --- --- How to "REDUCE" its value -- Do we have to make change in underlying storage?

"cm_cpu_milliseconds" -- --- --- How to "REDUCE" its value
"thread_network_receive_wait_time" --- How to "REDUCE" its value
"estimated_per_node_peak_memory" -- How to "REDUCE" its value

"thread_storage_wait_time_percentage" -- How to "REDUCE" its value
"memory_aggregate_peak" --- Is there any way to "REDUCE" this.


"admission_result" --- Is there any way to make it "immediate" always. Can we suggest to increase the

                                   MEM_LIMIT and high no. of queries in single pool. Any side effects of doing this.

 

"hdfs_average_scan_range" -- It should not be too low, but what is its ideal value?


"planning_wait_time" -- Recommendation would be Perform Statistics before execution, second time execution

                                     can "REDUCE" its value. How to "REDUCE" planning wait time besides compute

                                     statistics. 

"oom" -- recommendation would be Increase MEM_LIMIT. 
"memory_accrual"
"memory_spilled" -- Increase the Memory. How to "REDUCE" memory spill to disk.

                               Where we have to make changes in query?


"file_formats" ---- If TEXT/NONE Make it Parquet, otherwise use "Parquet Query option"
"hdfs_bytes_read_local" --- IS there any way to increase it? Can we suggest to go for "partitioning"
"hdfs_bytes_read_short_circuit" -- How to make it ideal.

"client_fetch_wait_time" --------- How to "REDUCE" it?
"memory_per_node_peak" -- Is there any way to "REDUCE" this value?
"hdfs_bytes_read_remote" -- How to "REDUCE" this value
"thread_network_send_wait_time" --- How to "REDUCE" this value?

"thread_cpu_time" --- How to "REDUCE" this value?
"hdfs_bytes_read" ----------
},

"durationMillis" -- How to "REDUCE"/ "lower" this?


So far I've found query option to optimized the query performance but I've identified 3-4 query options which I can use to optimized the query.

Expert Contributor
Posts: 142
Registered: ‎07-17-2017

Re: Increase IMPALA query performance/optimization based on Impala Query Attribute.

Hi,
Optimization is not an easy task, please try to read this document and explore the reel Impala query optimization methods.
https://www.cloudera.com/documentation/enterprise/5-15-x/topics/impala_performance.html
Good luck.