Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Impala queries slow with wide vs narrow views

Impala queries slow with wide vs narrow views

Explorer

 

Hi

I have a 6 node cluster (

impalad version 2.8.0-cdh5.11.0 RELEASE (build e09660de6b503a15f07e84b99b63e8e745854c34)

 

I have two views today and today_reduced. they are the same query on the same tables, the only difference is that today view selects 1392 cols and today_reduced selets 11, however the today_reduced is twice as fast as the today view. Why is this? I understand the more columns selected the slower the query, but why are things slower when I am selecting the same number of columns through differing views in such an instance what is causing impala to slow down and is there anyway to just select the cols specified in the query and not all thoses specifcied in the view.

 

Looking at the query plan's both views seem to have simliar charecteristics, however the acutal exeuction is taking much longer, the planner takes around aoround 30 times longer when using the wider view.

 

is this realted to IMPALA-4242? What would be th upper limit of columns that would prevent the planner being so slow?

 

Narrow View - 0.87s

 

select count(process_id) from today_reduced
Operator          #Hosts  Avg Time  Max Time   #Rows  Est. #Rows   Peak Mem  Est. Peak Mem  Detail                         
---------------------------------------------------------------------------------------------------------------------------
09:AGGREGATE           1   0.000ns   0.000ns       1           1   32.00 KB        -1.00 B  FINALIZE                       
08:EXCHANGE            1   0.000ns   0.000ns       4           1          0        -1.00 B  UNPARTITIONED                  
04:AGGREGATE           4   0.000ns   0.000ns       4           1   24.00 KB       10.00 MB                                 
03:HASH JOIN           4   0.000ns   0.000ns  23.60K          -1    8.27 MB        2.00 GB  INNER JOIN, BROADCAST          
|--07:EXCHANGE         4   0.000ns   0.000ns  23.60K          -1          0              0  BROADCAST                      
|  00:SCAN HDFS        1  40.000ms  40.000ms  23.60K          -1  155.00 KB       96.00 MB  man.results_april_by_day r     
06:AGGREGATE           4   0.000ns   0.000ns      51          67   10.37 MB       10.00 MB  FINALIZE                       
05:EXCHANGE            4   0.000ns   0.000ns      51          67          0              0  HASH(process_id,process_typ... 
02:AGGREGATE           4   2.000ms   4.000ms      51          67    1.44 MB       10.00 MB  STREAMING                      
01:SCAN HDFS           4   0.000ns   0.000ns      51          67  143.00 KB       48.00 MB  man.processes p                
    Planner Timeline: 23.899ms
       - Analysis finished: 1.565ms (1.565ms)
       - Equivalence classes computed: 2.217ms (652.145us)
       - Single node plan created: 12.249ms (10.032ms)
       - Runtime filters computed: 12.388ms (138.837us)
       - Distributed plan created: 12.590ms (202.648us)
       - Planning finished: 23.899ms (11.308ms)
    Query Timeline: 872.000ms
       - Query submitted: 0.000ns (0.000ns)
       - Planning finished: 32.000ms (32.000ms)
       - Submit for admission: 32.000ms (0.000ns)
       - Completed admission: 32.000ms (0.000ns)
       - Ready to start 10 fragment instances: 32.000ms (0.000ns)
       - All 10 fragment instances started: 40.000ms (8.000ms)
       - First dynamic filter received: 636.000ms (596.000ms)
       - Rows available: 780.000ms (144.000ms)
       - First row fetched: 864.000ms (84.000ms)
       - Unregister query: 868.000ms (4.000ms)
     - ComputeScanRangeAssignmentTimer: 0.000ns
 

Wide View - 1.80s

 

select count(process_id) from today
Operator          #Hosts  Avg Time  Max Time   #Rows  Est. #Rows   Peak Mem  Est. Peak Mem  Detail                         
---------------------------------------------------------------------------------------------------------------------------
09:AGGREGATE           1   0.000ns   0.000ns       1           1   32.00 KB        -1.00 B  FINALIZE                       
08:EXCHANGE            1   0.000ns   0.000ns       4           1          0        -1.00 B  UNPARTITIONED                  
04:AGGREGATE           4   0.000ns   0.000ns       4           1   24.00 KB       10.00 MB                                 
03:HASH JOIN           4   0.000ns   0.000ns  23.60K          -1    8.27 MB        2.00 GB  INNER JOIN, BROADCAST          
|--07:EXCHANGE         4   1.000ms   4.000ms  23.60K          -1          0              0  BROADCAST                      
|  00:SCAN HDFS        1  48.000ms  48.000ms  23.60K          -1  950.00 KB       96.00 MB  man.results_april_by_day r     
06:AGGREGATE           4   0.000ns   0.000ns      51          67   10.37 MB       10.00 MB  FINALIZE                       
05:EXCHANGE            4   0.000ns   0.000ns      51          67          0              0  HASH(process_id,process_typ... 
02:AGGREGATE           4   0.000ns   0.000ns      51          67    1.44 MB       10.00 MB  STREAMING                      
01:SCAN HDFS           4   3.000ms   4.000ms      51          67  143.00 KB       48.00 MB  man.processes p                
    Planner Timeline: 774.618ms
       - Analysis finished: 67.336ms (67.336ms)
       - Equivalence classes computed: 663.511ms (596.174ms)
       - Single node plan created: 760.272ms (96.760ms)
       - Runtime filters computed: 763.392ms (3.120ms)
       - Distributed plan created: 765.175ms (1.783ms)
       - Planning finished: 774.618ms (9.442ms)
    Query Timeline: 1s808ms
      - Query submitted: 0.000ns (0.000ns)
       - Planning finished: 928.000ms (928.000ms)
       - Submit for admission: 936.000ms (8.000ms)
       - Completed admission: 936.000ms (0.000ns)
       - Ready to start 10 fragment instances: 936.000ms (0.000ns)
       - All 10 fragment instances started: 972.000ms (36.000ms)
       - First dynamic filter received: 1s552ms (580.000ms)
       - Rows available: 1s704ms (152.000ms)
       - First row fetched: 1s796ms (92.000ms)
       - Unregister query: 1s796ms (0.000ns)
     - ComputeScanRangeAssignmentTimer: 0.000n 
4 REPLIES 4

Re: Impala queries slow with wide vs narrow views

Master Collaborator

Thanks for your report.

 

I agree this looks like IMPALA-4242 because most of the extra time is spent in computing equivalence classes. I'm afraid there is no good way to work around this issue apart from reducing the number of columns in your view. The equivalence class computation is heavily optimized but fundamentally O(N^3) (transitive closure), so that can give you a guideline as to how planning time will change if you increase/decrease the number of columns.

Highlighted

Re: Impala queries slow with wide vs narrow views

Explorer

Hi Alex,

 

Thanks so much, when you say O(N^3) (transitive closure), I assume N is the number of columns, what about O and Transitive Closure are these just constants, so if I have twice the number of columns, we would exepect the planning time to increases by 8 times (2^3), or for four times as many columns planning time to increase by 64 times (4^3)?

 

From your comments I am just trying to understand the issue. Is it that a large amount of time is spent trying to work out which of the columns in the query exisin in the view and therefor what subset of the view columns should be selected from the view? 

Re: Impala queries slow with wide vs narrow views

Explorer

HI Alex,

 

I did some quick tests, it did not seem the planner was slow down in the cubic series as you suggested.

 

10 Colmns in view 

Operator          #Hosts  Avg Time  Max Time   #Rows  Est. #Rows   Peak Mem  Est. Peak Mem  Detail                         
---------------------------------------------------------------------------------------------------------------------------
09:AGGREGATE           1   0.000ns   0.000ns       1           1   32.00 KB        -1.00 B  FINALIZE                       
08:EXCHANGE            1   0.000ns   0.000ns       4           1          0        -1.00 B  UNPARTITIONED                  
04:AGGREGATE           4   0.000ns   0.000ns       4           1   24.00 KB       10.00 MB                                 
03:HASH JOIN           4   1.000ms   4.000ms  23.60K          -1    8.27 MB        2.00 GB  INNER JOIN, BROADCAST          
|--07:EXCHANGE         4   0.000ns   0.000ns  23.60K          -1          0              0  BROADCAST                      
|  00:SCAN HDFS        1  24.000ms  24.000ms  23.60K          -1  128.51 KB       96.00 MB  man.results_april_by_day r     
06:AGGREGATE           4   0.000ns   0.000ns      51          67   10.37 MB       10.00 MB  FINALIZE                       
05:EXCHANGE            4   0.000ns   0.000ns      51          67          0              0  HASH(process_id,process_typ... 
02:AGGREGATE           4   0.000ns   0.000ns      51          67    1.44 MB       10.00 MB  STREAMING                      
01:SCAN HDFS           4   1.000ms   4.000ms      51          67  143.00 KB       48.00 MB  man.processes p                
    Planner Timeline: 3.782ms
       - Analysis finished: 761.813us (761.813us)
       - Equivalence classes computed: 945.444us (183.631us)
       - Single node plan created: 1.837ms (891.621us)
       - Runtime filters computed: 1.954ms (117.700us)
       - Distributed plan created: 2.162ms (208.153us)
       - Planning finished: 3.782ms (1.619ms)
    Query Timeline: 844.000ms
       - Query submitted: 0.000ns (0.000ns)
       - Planning finished: 12.000ms (12.000ms)
       - Submit for admission: 12.000ms (0.000ns)
       - Completed admission: 12.000ms (0.000ns)
       - Ready to start 10 fragment instances: 12.000ms (0.000ns)
       - All 10 fragment instances started: 16.000ms (4.000ms)
       - First dynamic filter received: 604.000ms (588.000ms)
       - Rows available: 756.000ms (152.000ms)
       - First row fetched: 836.000ms (80.000ms)
       - Unregister query: 836.000ms (0.000ns)
     - ComputeScanRangeAssignmentTimer: 0.000ns
  ImpalaServer:
     - ClientFetchWaitTimer: 80.000ms
     - RowMaterializationTimer: 0.000ns
  Execution Profile d940f6940c2d5468:41eb120400000000:(Total: 744.000ms, non-child: 0.000ns, % non-child: 0.00%)

20 Colmns in view 

Operator          #Hosts  Avg Time  Max Time   #Rows  Est. #Rows   Peak Mem  Est. Peak Mem  Detail                         
---------------------------------------------------------------------------------------------------------------------------
09:AGGREGATE           1   0.000ns   0.000ns       1           1   32.00 KB        -1.00 B  FINALIZE                       
08:EXCHANGE            1   0.000ns   0.000ns       4           1          0        -1.00 B  UNPARTITIONED                  
04:AGGREGATE           4   0.000ns   0.000ns       4           1   24.00 KB       10.00 MB                                 
03:HASH JOIN           4   0.000ns   0.000ns  23.60K          -1    8.27 MB        2.00 GB  INNER JOIN, BROADCAST          
|--07:EXCHANGE         4   0.000ns   0.000ns  23.60K          -1          0              0  BROADCAST                      
|  00:SCAN HDFS        1  28.000ms  28.000ms  23.60K          -1  128.51 KB       96.00 MB  man.results_april_by_day r     
06:AGGREGATE           4   0.000ns   0.000ns      51          67   10.37 MB       10.00 MB  FINALIZE                       
05:EXCHANGE            4   0.000ns   0.000ns      51          67          0              0  HASH(process_id,process_typ... 
02:AGGREGATE           4   0.000ns   0.000ns      51          67    1.44 MB       10.00 MB  STREAMING                      
01:SCAN HDFS           4   1.000ms   4.000ms      51          67  143.00 KB       48.00 MB  man.processes p                
    Planner Timeline: 2.986ms
       - Analysis finished: 649.899us (649.899us)
       - Equivalence classes computed: 883.025us (233.126us)
       - Single node plan created: 1.632ms (749.289us)
       - Runtime filters computed: 1.728ms (95.735us)
       - Distributed plan created: 1.913ms (185.159us)
       - Planning finished: 2.986ms (1.072ms)
    Query Timeline: 844.000ms
       - Query submitted: 0.000ns (0.000ns)
       - Planning finished: 8.000ms (8.000ms)
       - Submit for admission: 12.000ms (4.000ms)
       - Completed admission: 12.000ms (0.000ns)
       - Ready to start 10 fragment instances: 12.000ms (0.000ns)
       - All 10 fragment instances started: 16.000ms (4.000ms)
       - First dynamic filter received: 600.000ms (584.000ms)
       - Rows available: 764.000ms (164.000ms)
       - First row fetched: 832.000ms (68.000ms)
       - Unregister query: 836.000ms (4.000ms)
     - ComputeScanRangeAssignmentTimer: 0.000ns
  ImpalaServer:
     - ClientFetchWaitTimer: 68.000ms
     - RowMaterializationTimer: 4.000ms
  Execution Profile fe4fcf9d6328e7a6:ab0ecade00000000:(Total: 752.000ms, non-child: 0.000ns, % non-child: 0.00%)

40 Columns in view

 

Operator          #Hosts  Avg Time  Max Time   #Rows  Est. #Rows   Peak Mem  Est. Peak Mem  Detail                         
---------------------------------------------------------------------------------------------------------------------------
09:AGGREGATE           1   0.000ns   0.000ns       1           1   32.00 KB        -1.00 B  FINALIZE                       
08:EXCHANGE            1   0.000ns   0.000ns       4           1          0        -1.00 B  UNPARTITIONED                  
04:AGGREGATE           4   0.000ns   0.000ns       4           1   24.00 KB       10.00 MB                                 
03:HASH JOIN           4   1.000ms   4.000ms  23.60K          -1    8.27 MB        2.00 GB  INNER JOIN, BROADCAST          
|--07:EXCHANGE         4   0.000ns   0.000ns  23.60K          -1          0              0  BROADCAST                      
|  00:SCAN HDFS        1  32.000ms  32.000ms  23.60K          -1  128.51 KB       96.00 MB  man.results_april_by_day r     
06:AGGREGATE           4   1.000ms   4.000ms      51          67   10.37 MB       10.00 MB  FINALIZE                       
05:EXCHANGE            4   0.000ns   0.000ns      51          67          0              0  HASH(process_id,process_typ... 
02:AGGREGATE           4   0.000ns   0.000ns      51          67    1.44 MB       10.00 MB  STREAMING                      
01:SCAN HDFS           4   1.000ms   4.000ms      51          67  255.00 KB       48.00 MB  man.processes p                
    Planner Timeline: 3.777ms
       - Analysis finished: 783.677us (783.677us)
       - Equivalence classes computed: 1.250ms (466.747us)
       - Single node plan created: 2.193ms (943.055us)
       - Runtime filters computed: 2.304ms (111.516us)
       - Distributed plan created: 2.503ms (198.159us)
       - Planning finished: 3.777ms (1.274ms)
    Query Timeline: 744.000ms
       - Query submitted: 0.000ns (0.000ns)
       - Planning finished: 12.000ms (12.000ms)
       - Submit for admission: 12.000ms (0.000ns)
       - Completed admission: 12.000ms (0.000ns)
       - Ready to start 10 fragment instances: 12.000ms (0.000ns)
       - All 10 fragment instances started: 16.000ms (4.000ms)
       - First dynamic filter received: 604.000ms (588.000ms)
       - Rows available: 720.000ms (116.000ms)
       - First row fetched: 736.000ms (16.000ms)
       - Unregister query: 736.000ms (0.000ns)
     - ComputeScanRangeAssignmentTimer: 0.000ns
  ImpalaServer:
     - ClientFetchWaitTimer: 16.000ms
     - RowMaterializationTimer: 0.000ns

Re: Impala queries slow with wide vs narrow views

Explorer

All queries planning seems to stay the same until we get over 640 columns, where when we double the columns the planning goes from 106 ms to 395 ms which seems a 4 fold performance degrgation, not a cubic degregation as expected.

 

640 Columns

Operator          #Hosts  Avg Time  Max Time   #Rows  Est. #Rows   Peak Mem  Est. Peak Mem  Detail                         
---------------------------------------------------------------------------------------------------------------------------
09:AGGREGATE           1   0.000ns   0.000ns       1           1   32.00 KB        -1.00 B  FINALIZE                       
08:EXCHANGE            1   0.000ns   0.000ns       4           1          0        -1.00 B  UNPARTITIONED                  
04:AGGREGATE           4   0.000ns   0.000ns       4           1   24.00 KB       10.00 MB                                 
03:HASH JOIN           4   0.000ns   0.000ns  23.60K          -1    8.27 MB        2.00 GB  INNER JOIN, BROADCAST          
|--07:EXCHANGE         4   0.000ns   0.000ns  23.60K          -1          0              0  BROADCAST                      
|  00:SCAN HDFS        1  28.000ms  28.000ms  23.60K          -1    1.39 MB       96.00 MB  results_april_by_day r     
06:AGGREGATE           4   0.000ns   0.000ns      51          67   10.37 MB       10.00 MB  FINALIZE                       
05:EXCHANGE            4   0.000ns   0.000ns      51          67          0              0  HASH(process_id,process_typ... 
02:AGGREGATE           4   0.000ns   0.000ns      51          67    1.44 MB       10.00 MB  STREAMING                      
01:SCAN HDFS           4   2.000ms   4.000ms      51          67  143.00 KB       48.00 MB  processes p                
    Planner Timeline: 105.795ms
       - Analysis finished: 11.279ms (11.279ms)
       - Equivalence classes computed: 94.268ms (82.988ms)
       - Single node plan created: 101.248ms (6.980ms)
       - Runtime filters computed: 101.521ms (272.618us)
       - Distributed plan created: 101.886ms (365.467us)
       - Planning finished: 105.795ms (3.908ms)
    Query Timeline: 960.000ms
       - Query submitted: 0.000ns (0.000ns)
       - Planning finished: 124.000ms (124.000ms)
       - Submit for admission: 128.000ms (4.000ms)
       - Completed admission: 128.000ms (0.000ns)
       - Ready to start 10 fragment instances: 128.000ms (0.000ns)
       - All 10 fragment instances started: 144.000ms (16.000ms)
       - First dynamic filter received: 724.000ms (580.000ms)
       - Rows available: 852.000ms (128.000ms)
       - First row fetched: 952.000ms (100.000ms)
       - Unregister query: 952.000ms (0.000ns)
     - ComputeScanRangeAssignmentTimer: 4.000ms
  ImpalaServer:
     - ClientFetchWaitTimer: 100.000ms
     - RowMaterializationTimer: 0.000ns

1280 Columns

Operator          #Hosts  Avg Time  Max Time   #Rows  Est. #Rows   Peak Mem  Est. Peak Mem  Detail                         
---------------------------------------------------------------------------------------------------------------------------
09:AGGREGATE           1   0.000ns   0.000ns       1           1   32.00 KB        -1.00 B  FINALIZE                       
08:EXCHANGE            1   0.000ns   0.000ns       4           1          0        -1.00 B  UNPARTITIONED                  
04:AGGREGATE           4   0.000ns   0.000ns       4           1   24.00 KB       10.00 MB                                 
03:HASH JOIN           4   2.000ms   8.000ms  23.60K          -1    8.27 MB        2.00 GB  INNER JOIN, BROADCAST          
|--07:EXCHANGE         4   0.000ns   0.000ns  23.60K          -1          0              0  BROADCAST                      
|  00:SCAN HDFS        1  36.000ms  36.000ms  23.60K          -1  681.39 KB       96.00 MB  results_april_by_day r     
06:AGGREGATE           4   2.000ms   4.000ms      51          67   10.37 MB       10.00 MB  FINALIZE                       
05:EXCHANGE            4   0.000ns   0.000ns      51          67          0              0  HASH(process_id,process_typ... 
02:AGGREGATE           4   0.000ns   0.000ns      51          67    1.44 MB       10.00 MB  STREAMING                      
01:SCAN HDFS           4   1.000ms   4.000ms      51          67  143.00 KB       48.00 MB  processes p                
    Planner Timeline: 395.410ms
       - Analysis finished: 40.266ms (40.266ms)
       - Equivalence classes computed: 366.901ms (326.634ms)
       - Single node plan created: 388.295ms (21.393ms)
       - Runtime filters computed: 388.555ms (260.548us)
       - Distributed plan created: 390.933ms (2.377ms)
       - Planning finished: 395.410ms (4.477ms)
    Query Timeline: 1s180ms
       - Query submitted: 0.000ns (0.000ns)
       - Planning finished: 444.000ms (444.000ms)
       - Submit for admission: 444.000ms (0.000ns)
       - Completed admission: 444.000ms (0.000ns)
       - Ready to start 10 fragment instances: 448.000ms (4.000ms)
       - All 10 fragment instances started: 460.000ms (12.000ms)
       - First dynamic filter received: 1s040ms (580.000ms)
       - Rows available: 1s148ms (108.000ms)
       - First row fetched: 1s172ms (24.000ms)
       - Unregister query: 1s172ms (0.000ns)
     - ComputeScanRangeAssignmentTimer: 0.000ns
  ImpalaServer:
     - ClientFetchWaitTimer: 24.000ms
     - RowMaterializationTimer: 0.000ns