One of our end user tried to run an impala query, which is actually a view with a long query statement. We have noticed that the query time out's and we have found below logs in query details in cloudera.
WARNING: The following tables are missing relevant table and/or column statistics.
What could be the reason for this, does the view need to be validated or does impala does not support such view. what is the other way to know the root cause.
Any suggestions, would be highly appreciated.
Impala can query views.
Computing table stats on tables accessed by Impala queries is necessary to get the best performance, particularly for complex queries.
That's probably not the cause of whatever your user saw, but you need to include a query profile or the query status error message at least for us to give any tips about.
Thanks for your quick reply.
I did ran the compute stats command to know the query performance however I got the error below:
compute stats <table name>;
ERROR: AnalysisException: COMPUTE STATS not supported for view:
I also got the details from the Query details after running the query. I had to cancel it as it takes a lot of resources and goes in executing state for longer time.
Query ID: 130b304cc42b5010:19ef656c00000000
Query Type: QUERY
Query State: EXCEPTION
Start Time: Dec 15, 2020 7:17:34 AM
End Time: Dec 15, 2020 7:18:59 AM
Duration: 1m, 24s
Rows Produced: 0
Admission Result: Admitted immediately
Admission Wait Time: 0ms
Aggregate Peak Memory Usage: 13.2 GiB
Bytes Streamed: 15.6 GiB
Client Fetch Wait Time: 0ms
Client Fetch Wait Time Percentage: 0
Connected User: hadmin@COMPS-NVIRGINIA.LOCAL
Estimated per Node Peak Memory: 12.8 GiB
File Formats: PARQUET/NONE,PARQUET/SNAPPY
HDFS Average Scan Range: 90.5 KiB
HDFS Bytes Read: 24.7 GiB
HDFS Bytes Read From Cache: 0 B
HDFS Bytes Read From Cache Percentage: 0
HDFS Local Bytes Read: 23.7 GiB
HDFS Local Bytes Read Percentage: 96
HDFS Remote Bytes Read: 1.1 GiB
HDFS Remote Bytes Read Percentage: 4
HDFS Scanner Average Read Throughput: 155.9 MiB/s
HDFS Short Circuit Bytes Read: 23.7 GiB
HDFS Short Circuit Bytes Read Percentage: 96
Impala Version: impalad version 2.11.0-cdh5.14.4 RELEASE
Memory Accrual: 158,669,819,348 byte seconds
Memory Spilled: 1.0 GiB
Network Address: 10.206.100.226:42238
Node with Peak Memory Usage: usnprod3.n-virginia.dc:22000
Out of Memory: false
Per Node Peak Memory Usage: 5.2 GiB
Planning Wait Time: 6.69s
Planning Wait Time Percentage: 8
Query Status: Cancelled
Session ID: 9b32167b6eef775e:293bgd8a4350f200
Session Type: BEESWAX
Statistics Corrupt: false
Statistics Missing: true
Threads: CPU Time: 2.9m
Threads: CPU Time Percentage: 2
Threads: Network Receive Wait Time: 11.4m
Threads: Network Receive Wait Time Percentage: 6
Threads: Network Send Wait Time: 45.4m
Threads: Network Send Wait Time Percentage: 24
Threads: Storage Wait Time: 2.2h
Threads: Storage Wait Time Percentage: 69
Threads: Total Time: 3.2h
Please suggest is it due to memory issue that the query is actually getting time out and there is no output from the impala shell, however it shows query executing in cloudera.
The Default Query Memory Limit in Impala = 6 GB
Max Memory = 270 GB
A quick reply would be highly appreciated.