Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Impala - performance degradation between 2.6 and 2.2

avatar

Hi,

 I have an issue after upgrade of cluster and migrate to a newer hardware. The same query "Select part_col, count(*) from table group by part_col" runs significantly slower on the new environment. In this case there are two clusters, one 4 nodes (CDH 5.4 Impala 2.2), one with 5 nodes (CDH5.8 Impala 2.6 kerberized). The worker nodes are newer and has more data disks, the processor is the same Xeon E5-2640 v3 vs E5-2650 v2.

The connectivity is 1GBit (old) vs 10Gbit(new).

 

So the old cluster has less nodes, less disks per node (5 vs 16), less RAM, but still manages to execute the query faster than the new cluster. I digged into the profile of the queries and found out that the step AGGREGATE takes much longer and consumes all the exec time. And in the new version it has a tag "STREAMING". The HDFS scan is in this case more or less the same, but on larger table much faster on the new cluster. So I assume this is not an IO issue.

 

I assume that during AGGREGATE the data is already in RAM and hashes are computed by CPU..

Does it mean that the new cluster has less powerfull CPU?

Or what other factors can affect the AGGREGATE (STREAMING) duration? 

Is there any other tools or papers describing how to tune Impala in these situations?

 

Thanks for any advice

 

OLD CLUSTER - Impala 2.2
Operator              #Hosts   Avg Time   Max Time    #Rows  Est. #Rows  Peak Mem  Est. Peak Mem  Detail             
---------------------------------------------------------------------------------------------------------------------
05:MERGING-EXCHANGE        1  119.314us  119.314us        5          -1         0        -1.00 B  UNPARTITIONED      
02:SORT                    4   435.85us  588.641us        5          -1   8.02 MB              0                     
04:AGGREGATE               4  255.672ms  272.347ms        5          -1   3.15 MB      128.00 MB  FINALIZE           
03:EXCHANGE                4   62.142us    87.60us       20          -1         0              0  HASH(part_col) 
01:AGGREGATE               4    2s803ms    3s270ms       20          -1   3.17 MB      128.00 MB                     
00:SCAN HDFS               4   53.678ms   56.787ms  103.50M          -1   1.82 MB       88.00 MB  default.table    
  
NEW CLUSTER - Impala 2.6                     
Operator              #Hosts   Avg Time   Max Time    #Rows  Est. #Rows  Peak Mem  Est. Peak Mem  Detail             
---------------------------------------------------------------------------------------------------------------------
05:MERGING-EXCHANGE        1   82.553us   82.553us        5          -1         0        -1.00 B  UNPARTITIONED      
02:SORT                    5  434.217us  491.025us        5          -1   8.02 MB              0                     
04:AGGREGATE               5  279.461ms  316.351ms        5          -1   2.30 MB      128.00 MB  FINALIZE           
03:EXCHANGE                5   28.378us   55.658us       25          -1         0              0  HASH(part_col) 
01:AGGREGATE               5    3s965ms    4s739ms       25          -1   1.59 MB      128.00 MB  STREAMING          
00:SCAN HDFS               5   83.393ms  118.345ms  103.50M          -1  12.04 MB       88.00 MB  default.table   

 

1 ACCEPTED SOLUTION

avatar

It's not entirely obvious to me what changed. Since it's grouping by timestamp, you're running into https://issues.cloudera.org/browse/IMPALA-3884, which is forcing it to execute the aggregation in an interpreted mode. I suspect that the performance of that interpreted code path may have changed a bit - we tend to focus most of our tuning on the codegened code path.

 

On releases with IMPALA-3884 fixed it should be a *lot* faster.

 

"          ExecOption: Codegen Disabled: HashTableCtx::CodegenEvalRow(): type TIMESTAMP NYI"

View solution in original post

5 REPLIES 5

avatar

Can you share the explain plan and the query? I'm interested to know what data types and aggregate functions you're using.


We're not aware of any performance regressions between those versions - we actually made a number of improvements, e.g. changing the hash table implementation and adding software prefetching. It's possible that your workload hit one of the edge cases where these changes had a detrimental effect.

avatar

If you want to experiment you could try setting the query options "PREFETCH_MODE=0" and "DISABLE_STREAMING_PREAGGREGATIONS=1". You could also try adjusting NUM_SCANNER_THREADS.

 

With the last option, we changed our recommendation recently to set it to the # of logical cores on the machine (it was 3x logical cores originally - we changed the default in a later release). You can set the default via the "default query options" impalad startup option.

avatar

I tried to disable prefetch and streaming, but did not helped.

 

Here is the profile of old Impala on old cluster:

 

 

+---------------------+----------+
| part_col            | count(*) |
+---------------------+----------+
| 2017-01-01 00:00:00 | 18428899 |
| 2017-01-02 00:00:00 | 20974112 |
| 2017-01-03 00:00:00 | 13178213 |
| 2017-01-04 00:00:00 | 26768773 |
| 2017-01-05 00:00:00 | 24148741 |
+---------------------+----------+
Query Runtime Profile:
Query (id=a7442ed1a43b8e8d:8ad38fb17f29baa0):
  Summary:
    Session ID: f94cc4d4c075f287:175ee4069895a6b0
    Session Type: BEESWAX
    Start Time: 2017-01-31 09:07:47.514559000
    End Time: 2017-01-31 09:07:51.159487000
    Query Type: QUERY
    Query State: FINISHED
    Query Status: OK
    Impala Version: impalad version 2.2.0-cdh5.4.8 RELEASE (build 137d99e9f751c454a0c79f3b00302938e4984f9c)
    User: hadoop
    Connected User: hadoop
    Delegated User: 
    Network Address: ::ffff:xx.xx.xx.xx:33929
    Default Db: default
    Sql Statement: select part_col, count(*) from perf_test group by part_col order by 1
    Coordinator: worker2.some.domain.net:22000
    Plan: 
----------------
Estimated Per-Host Requirements: Memory=256.00MB VCores=2
WARNING: The following tables are missing relevant table and/or column statistics.
default.perf_test

F02:PLAN FRAGMENT [UNPARTITIONED]
  05:MERGING-EXCHANGE [UNPARTITIONED]
     order by: part_col ASC
     hosts=4 per-host-mem=unavailable
     tuple-ids=2 row-size=24B cardinality=unavailable

F01:PLAN FRAGMENT [HASH(part_col)]
  DATASTREAM SINK [FRAGMENT=F02, EXCHANGE=05, UNPARTITIONED]
  02:SORT
  |  order by: part_col ASC
  |  hosts=4 per-host-mem=0B
  |  tuple-ids=2 row-size=24B cardinality=unavailable
  |
  04:AGGREGATE [FINALIZE]
  |  output: count:merge(*)
  |  group by: part_col
  |  hosts=4 per-host-mem=128.00MB
  |  tuple-ids=1 row-size=24B cardinality=unavailable
  |
  03:EXCHANGE [HASH(part_col)]
     hosts=4 per-host-mem=0B
     tuple-ids=1 row-size=24B cardinality=unavailable

F00:PLAN FRAGMENT [RANDOM]
  DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=03, HASH(part_col)]
  01:AGGREGATE
  |  output: count(*)
  |  group by: part_col
  |  hosts=4 per-host-mem=128.00MB
  |  tuple-ids=1 row-size=24B cardinality=unavailable
  |
  00:SCAN HDFS [default.perf_test, RANDOM]
     partitions=1/1 files=31 size=6.89GB
     table stats: unavailable
     column stats: unavailable
     hosts=4 per-host-mem=88.00MB
     tuple-ids=0 row-size=16B cardinality=unavailable
----------------
    Estimated Per-Host Mem: 268435456
    Estimated Per-Host VCores: 2
    Tables Missing Stats: default.perf_test
    Admission result: Admitted immediately
    Request Pool: root.hadoop
    ExecSummary: 
Operator              #Hosts   Avg Time   Max Time    #Rows  Est. #Rows  Peak Mem  Est. Peak Mem  Detail             
---------------------------------------------------------------------------------------------------------------------
05:MERGING-EXCHANGE        1  119.314us  119.314us        5          -1         0        -1.00 B  UNPARTITIONED      
02:SORT                    4   435.85us  588.641us        5          -1   8.02 MB              0                     
04:AGGREGATE               4  255.672ms  272.347ms        5          -1   3.15 MB      128.00 MB  FINALIZE           
03:EXCHANGE                4   62.142us    87.60us       20          -1         0              0  HASH(part_col) 
01:AGGREGATE               4    2s803ms    3s270ms       20          -1   3.17 MB      128.00 MB                     
00:SCAN HDFS               4   53.678ms   56.787ms  103.50M          -1   1.82 MB       88.00 MB  default.perf_test    
    Planner Timeline: 3.584ms
       - Analysis finished: 1.132ms (1.132ms)
       - Equivalence classes computed: 1.236ms (103.831us)
       - Single node plan created: 1.823ms (587.16us)
       - Distributed plan created: 2.300ms (477.253us)
       - Lineage info computed: 2.664ms (363.854us)
       - Planning finished: 3.584ms (919.713us)
    Query Timeline: 3s649ms
       - Start execution: 42.805us (42.805us)
       - Planning finished: 4.458ms (4.415ms)
       - Submit for admission: 4.775ms (317.243us)
       - Completed admission: 4.838ms (63.556us)
       - Ready to start remote fragments: 5.352ms (514.114us)
       - Remote fragments started: 511.206ms (505.853ms)
       - Rows available: 3s612ms (3s101ms)
       - First row fetched: 3s639ms (26.709ms)
       - Unregister query: 3s644ms (5.608ms)
  ImpalaServer:
     - ClientFetchWaitTimer: 30.950ms
     - RowMaterializationTimer: 72.117us
  Execution Profile a7442ed1a43b8e8d:8ad38fb17f29baa0:(Total: 3s608ms, non-child: 0ns, % non-child: 0.00%)
    Fragment start latencies: count: 8, last: 0.231475ns, min: 0.160659ns, max: 0.273263ns, mean: 0.221710ns, stddev: 0.041526ns
    Per Node Peak Memory Usage: worker4.some.domain.net:22000(8.41 MB) worker1.some.domain.net:22000(8.10 MB) worker3.some.domain.net:22000(8.41 MB) worker2.some.domain.net:22000(8.23 MB) 
     - FinalizationTimer: 0ns
    Coordinator Fragment F02:(Total: 3s100ms, non-child: 251.349us, % non-child: 0.01%)
      MemoryUsage(500.0ms): 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB
      ThreadUsage(500.0ms): 1
       - AverageThreadTokens: 1.00 
       - PeakMemoryUsage: 24.00 KB (24576)
       - PerHostPeakMemUsage: 0
       - PrepareTime: 49.521us
       - RowsProduced: 5 (5)
       - TotalCpuTime: 3s634ms
       - TotalNetworkReceiveTime: 0ns
       - TotalNetworkSendTime: 0ns
       - TotalStorageWaitTime: 0ns
      BlockMgr:
         - BlockWritesOutstanding: 0 (0)
         - BlocksCreated: 96 (96)
         - BlocksRecycled: 1 (1)
         - BufferedPins: 0 (0)
         - BytesWritten: 0
         - MaxBlockSize: 8.00 MB (8388608)
         - MemoryLimit: 38.40 GB (41231687680)
         - PeakMemoryUsage: 8.01 MB (8396800)
         - TotalBufferWaitTime: 0ns
         - TotalEncryptionTime: 0ns
         - TotalIntegrityCheckTime: 0ns
         - TotalReadBlockTime: 0ns
      EXCHANGE_NODE (id=5):(Total: 3s100ms, non-child: 119.314us, % non-child: 0.00%)
        BytesReceived(500.0ms): 0, 0, 0, 0, 0, 0, 0
         - BytesReceived: 134.00 B (134)
         - ConvertRowBatchTime: 0ns
         - DeserializeRowBatchTimer: 31.363us
         - FirstBatchArrivalWaitTime: 3s100ms
         - MergeGetNext: 51.896us
         - MergeGetNextBatch: 45.734us
         - PeakMemoryUsage: 0
         - RowsReturned: 5 (5)
         - RowsReturnedRate: 1.00 /sec
         - SendersBlockedTimer: 0ns
         - SendersBlockedTotalTimer(*): 0ns
    Averaged Fragment F01:(Total: 3s604ms, non-child: 0ns, % non-child: 0.00%)
      split sizes:  min: 0, max: 0, avg: 0, stddev: 0
      completion times: min:3s334ms  max:3s367ms  mean: 3s350ms  stddev:15.818906.818.905508ms
      execution rates: min:0.00 /sec  max:0.00 /sec  mean:0.00 /sec  stddev:0.00 /sec
      num instances: 4
       - AverageThreadTokens: 1.00 
       - PeakMemoryUsage: 8.26 MB (8665088)
       - PerHostPeakMemUsage: 8.29 MB (8689408)
       - PrepareTime: 255.347ms
       - RowsProduced: 1 (1)
       - TotalCpuTime: 257.307ms
       - TotalNetworkReceiveTime: 3s347ms
       - TotalNetworkSendTime: 314.714us
       - TotalStorageWaitTime: 0ns
      BlockMgr:
         - BlockWritesOutstanding: 0 (0)
         - BlocksCreated: 96 (96)
         - BlocksRecycled: 1 (1)
         - BufferedPins: 0 (0)
         - BytesWritten: 0
         - MaxBlockSize: 8.00 MB (8388608)
         - MemoryLimit: 38.40 GB (41231687680)
         - PeakMemoryUsage: 8.01 MB (8399530)
         - TotalBufferWaitTime: 0ns
         - TotalEncryptionTime: 0ns
         - TotalIntegrityCheckTime: 0ns
         - TotalReadBlockTime: 0ns
      CodeGen:(Total: 254.686ms, non-child: 254.686ms, % non-child: 100.00%)
         - CodegenTime: 478.430us
         - CompileTime: 323ns
         - LoadTime: 117.38us
         - ModuleFileSize: 2.23 MB (2333204)
         - OptimizationTime: 0ns
         - PrepareTime: 252.960ms
      DataStreamSender (dst_id=5):(Total: 40.576us, non-child: 40.576us, % non-child: 100.00%)
         - BytesSent: 33.00 B (33)
         - NetworkThroughput(*): 111.39 KB/sec
         - OverallThroughput: 713.32 KB/sec
         - PeakMemoryUsage: 4.00 KB (4096)
         - SerializeBatchTime: 14.713us
         - ThriftTransmitTime(*): 233.563us
         - UncompressedRowBatchSize: 48.00 B (48)
      SORT_NODE (id=2):(Total: 3s603ms, non-child: 435.86us, % non-child: 0.01%)
         - InMemorySortTime: 2.144us
         - InitialRunsCreated: 1 (1)
         - PeakMemoryUsage: 8.01 MB (8402944)
         - RowsReturned: 1 (1)
         - RowsReturnedRate: 0
         - SortDataSize: 40.00 B (40)
         - TotalMergesPerformed: 0 (0)
      AGGREGATION_NODE (id=4):(Total: 3s603ms, non-child: 255.672ms, % non-child: 7.10%)
         - BuildTime: 21.426us
         - GetNewBlockTime: 55.516us
         - GetResultsTime: 2.938us
         - HashBuckets: 1.28K (1280)
         - LargestPartitionPercent: 12 (12)
         - MaxPartitionLevel: 0 (0)
         - NumRepartitions: 0 (0)
         - PartitionsCreated: 16 (16)
         - PeakMemoryUsage: 3.14 MB (3297280)
         - PinTime: 0ns
         - RowsRepartitioned: 0 (0)
         - RowsReturned: 1 (1)
         - RowsReturnedRate: 0
         - SpilledPartitions: 0 (0)
         - UnpinTime: 870ns
      EXCHANGE_NODE (id=3):(Total: 3s347ms, non-child: 3s347ms, % non-child: 100.00%)
         - BytesReceived: 134.00 B (134)
         - ConvertRowBatchTime: 8.18us
         - DeserializeRowBatchTimer: 31.103us
         - FirstBatchArrivalWaitTime: 2s437ms
         - PeakMemoryUsage: 0
         - RowsReturned: 5 (5)
         - RowsReturnedRate: 1.00 /sec
         - SendersBlockedTimer: 0ns
         - SendersBlockedTotalTimer(*): 0ns
    Averaged Fragment F00:(Total: 2s857ms, non-child: 0ns, % non-child: 0.00%)
      split sizes:  min: 1.27 GB, max: 1.92 GB, avg: 1.72 GB, stddev: 272.90 MB
      completion times: min:1s886ms  max:3s146ms  mean: 2s675ms  stddev:477.663985.663.984856ms
      execution rates: min:624.97 MB/sec  max:690.60 MB/sec  mean:663.26 MB/sec  stddev:23.89 MB/sec
      num instances: 4
       - AverageThreadTokens: 7.44 
       - PeakMemoryUsage: 4.93 MB (5166592)
       - PerHostPeakMemUsage: 8.07 MB (8457728)
       - PrepareTime: 185.387ms
       - RowsProduced: 5 (5)
       - TotalCpuTime: 22s474ms
       - TotalNetworkReceiveTime: 0ns
       - TotalNetworkSendTime: 1.299ms
       - TotalStorageWaitTime: 87.376ms
      CodeGen:(Total: 184.807ms, non-child: 184.807ms, % non-child: 100.00%)
         - CodegenTime: 290.866us
         - CompileTime: 233ns
         - LoadTime: 45.102us
         - ModuleFileSize: 2.23 MB (2333204)
         - OptimizationTime: 0ns
         - PrepareTime: 183.547ms
      DataStreamSender (dst_id=3):(Total: 24.337us, non-child: 24.337us, % non-child: 100.00%)
         - BytesSent: 134.00 B (134)
         - NetworkThroughput(*): 120.91 KB/sec
         - OverallThroughput: 5.27 MB/sec
         - PeakMemoryUsage: 40.00 KB (40960)
         - SerializeBatchTime: 23.984us
         - ThriftTransmitTime(*): 1.114ms
         - UncompressedRowBatchSize: 192.00 B (192)
      AGGREGATION_NODE (id=1):(Total: 2s856ms, non-child: 2s803ms, % non-child: 98.12%)
         - BuildTime: 2s601ms
         - GetNewBlockTime: 44.889us
         - GetResultsTime: 4.697us
         - HashBuckets: 4.10K (4096)
         - LargestPartitionPercent: 0 (0)
         - MaxPartitionLevel: 0 (0)
         - NumRepartitions: 0 (0)
         - PartitionsCreated: 16 (16)
         - PeakMemoryUsage: 3.17 MB (3325952)
         - PinTime: 0ns
         - RowsRepartitioned: 0 (0)
         - RowsReturned: 5 (5)
         - RowsReturnedRate: 1.00 /sec
         - SpilledPartitions: 0 (0)
         - UnpinTime: 792ns
      HDFS_SCAN_NODE (id=0):(Total: 53.678ms, non-child: 53.678ms, % non-child: 100.00%)
         - AverageHdfsReadThreadConcurrency: 0.00 
         - AverageScannerThreadConcurrency: 6.65 
         - BytesRead: 777.91 KB (796579)
         - BytesReadDataNodeCache: 0
         - BytesReadLocal: 777.91 KB (796579)
         - BytesReadRemoteUnexpected: 0
         - BytesReadShortCircuit: 777.91 KB (796579)
         - DecompressionTime: 136.633us
         - MaxCompressedTextFileLength: 0
         - NumColumns: 1 (1)
         - NumDisksAccessed: 1 (1)
         - NumScannerThreadsStarted: 7 (7)
         - PeakMemoryUsage: 1.74 MB (1824256)
         - PerReadThreadRawHdfsThroughput: 748.79 MB/sec
         - RemoteScanRanges: 0 (0)
         - RowsRead: 25.87M (25874684)
         - RowsReturned: 25.87M (25874684)
         - RowsReturnedRate: 479.03 M/sec
         - ScanRangesComplete: 7 (7)
         - ScannerThreadsInvoluntaryContextSwitches: 10 (10)
         - ScannerThreadsTotalWallClockTime: 19s703ms
           - MaterializeTupleTime(*): 19s566ms
           - ScannerThreadsSysTime: 24.0ms
           - ScannerThreadsUserTime: 1s204ms
         - ScannerThreadsVoluntaryContextSwitches: 25.27K (25265)
         - TotalRawHdfsReadTime(*): 1.19ms
         - TotalReadThroughput: 274.84 KB/sec
    Fragment F01:
      Instance a7442ed1a43b8e8d:8ad38fb17f29baa5 (host=worker4.some.domain.net:22000):(Total: 3s605ms, non-child: 0ns, % non-child: 0.00%)
        MemoryUsage(500.0ms): 0, 3.14 MB, 3.14 MB, 3.14 MB, 3.14 MB, 3.15 MB, 3.15 MB
        ThreadUsage(500.0ms): 1, 1, 1, 1, 1, 1, 1
         - AverageThreadTokens: 1.00 
         - PeakMemoryUsage: 8.41 MB (8818688)
         - PerHostPeakMemUsage: 8.41 MB (8818688)
         - PrepareTime: 271.989ms
         - RowsProduced: 2 (2)
         - TotalCpuTime: 273.763ms
         - TotalNetworkReceiveTime: 3s332ms
         - TotalNetworkSendTime: 359.786us
         - TotalStorageWaitTime: 0ns
        BlockMgr:
           - BlockWritesOutstanding: 0 (0)
           - BlocksCreated: 96 (96)
           - BlocksRecycled: 1 (1)
           - BufferedPins: 0 (0)
           - BytesWritten: 0
           - MaxBlockSize: 8.00 MB (8388608)
           - MemoryLimit: 38.40 GB (41231687680)
           - PeakMemoryUsage: 8.02 MB (8404992)
           - TotalBufferWaitTime: 0ns
           - TotalEncryptionTime: 0ns
           - TotalIntegrityCheckTime: 0ns
           - TotalReadBlockTime: 0ns
        CodeGen:(Total: 271.331ms, non-child: 271.331ms, % non-child: 100.00%)
           - CodegenTime: 437.945us
           - CompileTime: 267ns
           - LoadTime: 132.3us
           - ModuleFileSize: 2.23 MB (2333204)
           - OptimizationTime: 0ns
           - PrepareTime: 269.916ms
        DataStreamSender (dst_id=5):(Total: 41.665us, non-child: 41.665us, % non-child: 100.00%)
           - BytesSent: 53.00 B (53)
           - NetworkThroughput(*): 214.21 KB/sec
           - OverallThroughput: 1.21 MB/sec
           - PeakMemoryUsage: 4.00 KB (4096)
           - SerializeBatchTime: 16.673us
           - ThriftTransmitTime(*): 241.624us
           - UncompressedRowBatchSize: 76.00 B (76)
        SORT_NODE (id=2):(Total: 3s604ms, non-child: 313.224us, % non-child: 0.01%)
           - InMemorySortTime: 2.746us
           - InitialRunsCreated: 1 (1)
           - PeakMemoryUsage: 8.02 MB (8404992)
           - RowsReturned: 2 (2)
           - RowsReturnedRate: 0
           - SortDataSize: 64.00 B (64)
           - TotalMergesPerformed: 0 (0)
        AGGREGATION_NODE (id=4):(Total: 3s604ms, non-child: 272.347ms, % non-child: 7.56%)
           - BuildTime: 28.572us
           - GetNewBlockTime: 86.550us
           - GetResultsTime: 5.612us
           - HashBuckets: 2.05K (2048)
           - LargestPartitionPercent: 12 (12)
           - MaxPartitionLevel: 0 (0)
           - NumRepartitions: 0 (0)
           - PartitionsCreated: 16 (16)
           - PeakMemoryUsage: 3.15 MB (3301376)
           - PinTime: 0ns
           - RowsRepartitioned: 0 (0)
           - RowsReturned: 2 (2)
           - RowsReturnedRate: 0
           - SpilledPartitions: 0 (0)
           - UnpinTime: 827ns
        EXCHANGE_NODE (id=3):(Total: 3s332ms, non-child: 69.672us, % non-child: 0.00%)
          BytesReceived(500.0ms): 0, 0, 0, 0, 52.00 B, 52.00 B
           - BytesReceived: 208.00 B (208)
           - ConvertRowBatchTime: 8.324us
           - DeserializeRowBatchTimer: 44.682us
           - FirstBatchArrivalWaitTime: 2s117ms
           - PeakMemoryUsage: 0
           - RowsReturned: 8 (8)
           - RowsReturnedRate: 2.00 /sec
           - SendersBlockedTimer: 0ns
           - SendersBlockedTotalTimer(*): 0ns
      Instance a7442ed1a43b8e8d:8ad38fb17f29baa3 (host=worker2.some.domain.net:22000):(Total: 3s604ms, non-child: 0ns, % non-child: 0.00%)
        MemoryUsage(500.0ms): 3.14 MB, 3.14 MB, 3.14 MB, 3.14 MB, 3.14 MB, 3.15 MB, 3.16 MB
        ThreadUsage(500.0ms): 1, 1, 1, 1, 1, 1, 1
         - AverageThreadTokens: 1.00 
         - PeakMemoryUsage: 8.21 MB (8613888)
         - PerHostPeakMemUsage: 8.23 MB (8626176)
         - PrepareTime: 238.304ms
         - RowsProduced: 1 (1)
         - TotalCpuTime: 240.469ms
         - TotalNetworkReceiveTime: 3s364ms
         - TotalNetworkSendTime: 332.998us
         - TotalStorageWaitTime: 0ns
        CodeGen:(Total: 237.499ms, non-child: 237.499ms, % non-child: 100.00%)
           - CodegenTime: 621.766us
           - CompileTime: 524ns
           - LoadTime: 61.381us
           - ModuleFileSize: 2.23 MB (2333204)
           - OptimizationTime: 0ns
           - PrepareTime: 234.905ms
        DataStreamSender (dst_id=5):(Total: 62.686us, non-child: 62.686us, % non-child: 100.00%)
           - BytesSent: 31.00 B (31)
           - NetworkThroughput(*): 121.00 KB/sec
           - OverallThroughput: 482.94 KB/sec
           - PeakMemoryUsage: 4.00 KB (4096)
           - SerializeBatchTime: 25.91us
           - ThriftTransmitTime(*): 250.198us
           - UncompressedRowBatchSize: 40.00 B (40)
        SORT_NODE (id=2):(Total: 3s604ms, non-child: 588.641us, % non-child: 0.02%)
           - InMemorySortTime: 1.479us
           - InitialRunsCreated: 1 (1)
           - PeakMemoryUsage: 8.02 MB (8404992)
           - RowsReturned: 1 (1)
           - RowsReturnedRate: 0
           - SortDataSize: 32.00 B (32)
           - TotalMergesPerformed: 0 (0)
        AGGREGATION_NODE (id=4):(Total: 3s603ms, non-child: 238.727ms, % non-child: 6.62%)
           - BuildTime: 30.821us
           - GetNewBlockTime: 57.183us
           - GetResultsTime: 3.80us
           - HashBuckets: 1.02K (1024)
           - LargestPartitionPercent: 25 (25)
           - MaxPartitionLevel: 0 (0)
           - NumRepartitions: 0 (0)
           - PartitionsCreated: 16 (16)
           - PeakMemoryUsage: 3.15 MB (3301376)
           - PinTime: 0ns
           - RowsRepartitioned: 0 (0)
           - RowsReturned: 1 (1)
           - RowsReturnedRate: 0
           - SpilledPartitions: 0 (0)
           - UnpinTime: 1.96us
        EXCHANGE_NODE (id=3):(Total: 3s364ms, non-child: 87.60us, % non-child: 0.00%)
          BytesReceived(500.0ms): 0, 0, 0, 0, 0, 30.00 B, 90.00 B
           - BytesReceived: 120.00 B (120)
           - ConvertRowBatchTime: 11.349us
           - DeserializeRowBatchTimer: 44.852us
           - FirstBatchArrivalWaitTime: 2s150ms
           - PeakMemoryUsage: 0
           - RowsReturned: 4 (4)
           - RowsReturnedRate: 1.00 /sec
           - SendersBlockedTimer: 0ns
           - SendersBlockedTotalTimer(*): 0ns
      Instance a7442ed1a43b8e8d:8ad38fb17f29baa4 (host=worker1.some.domain.net:22000):(Total: 3s604ms, non-child: 0ns, % non-child: 0.00%)
        MemoryUsage(500.0ms): 0, 3.14 MB, 3.14 MB, 3.14 MB, 3.14 MB, 3.14 MB, 3.14 MB, 3.14 MB
        ThreadUsage(500.0ms): 1, 1, 1, 1, 1, 1, 1, 1
         - AverageThreadTokens: 1.00 
         - PeakMemoryUsage: 8.02 MB (8409088)
         - PerHostPeakMemUsage: 8.10 MB (8494080)
         - PrepareTime: 240.672ms
         - RowsProduced: 0 (0)
         - TotalCpuTime: 242.522ms
         - TotalNetworkReceiveTime: 3s362ms
         - TotalNetworkSendTime: 230ns
         - TotalStorageWaitTime: 0ns
        BlockMgr:
           - BlockWritesOutstanding: 0 (0)
           - BlocksCreated: 96 (96)
           - BlocksRecycled: 1 (1)
           - BufferedPins: 0 (0)
           - BytesWritten: 0
           - MaxBlockSize: 8.00 MB (8388608)
           - MemoryLimit: 38.40 GB (41231687680)
           - PeakMemoryUsage: 8.00 MB (8388608)
           - TotalBufferWaitTime: 0ns
           - TotalEncryptionTime: 0ns
           - TotalIntegrityCheckTime: 0ns
           - TotalReadBlockTime: 0ns
        CodeGen:(Total: 240.75ms, non-child: 240.75ms, % non-child: 100.00%)
           - CodegenTime: 430.880us
           - CompileTime: 253ns
           - LoadTime: 153.349us
           - ModuleFileSize: 2.23 MB (2333204)
           - OptimizationTime: 0ns
           - PrepareTime: 238.587ms
        DataStreamSender (dst_id=5):(Total: 14.671us, non-child: 14.671us, % non-child: 100.00%)
           - BytesSent: 0
           - NetworkThroughput(*): 0.00 /sec
           - OverallThroughput: 0.00 /sec
           - PeakMemoryUsage: 4.00 KB (4096)
           - SerializeBatchTime: 0ns
           - ThriftTransmitTime(*): 0ns
           - UncompressedRowBatchSize: 0
        SORT_NODE (id=2):(Total: 3s603ms, non-child: 364.504us, % non-child: 0.01%)
           - InMemorySortTime: 1.118us
           - InitialRunsCreated: 1 (1)
           - PeakMemoryUsage: 8.01 MB (8396800)
           - RowsReturned: 0 (0)
           - RowsReturnedRate: 0
           - SortDataSize: 0
           - TotalMergesPerformed: 0 (0)
        AGGREGATION_NODE (id=4):(Total: 3s603ms, non-child: 240.841ms, % non-child: 6.68%)
           - BuildTime: 1.836us
           - GetNewBlockTime: 39.494us
           - GetResultsTime: 0ns
           - HashBuckets: 0 (0)
           - LargestPartitionPercent: 0 (0)
           - MaxPartitionLevel: 0 (0)
           - NumRepartitions: 0 (0)
           - PartitionsCreated: 16 (16)
           - PeakMemoryUsage: 3.13 MB (3284992)
           - PinTime: 0ns
           - RowsRepartitioned: 0 (0)
           - RowsReturned: 0 (0)
           - RowsReturnedRate: 0
           - SpilledPartitions: 0 (0)
           - UnpinTime: 686ns
        EXCHANGE_NODE (id=3):(Total: 3s362ms, non-child: 5.944us, % non-child: 0.00%)
          BytesReceived(500.0ms): 0, 0, 0, 0, 0, 0, 0
           - BytesReceived: 0
           - ConvertRowBatchTime: 817ns
           - DeserializeRowBatchTimer: 0ns
           - FirstBatchArrivalWaitTime: 3s362ms
           - PeakMemoryUsage: 0
           - RowsReturned: 0 (0)
           - RowsReturnedRate: 0
           - SendersBlockedTimer: 0ns
           - SendersBlockedTotalTimer(*): 0ns
      Instance a7442ed1a43b8e8d:8ad38fb17f29baa2 (host=worker3.some.domain.net:22000):(Total: 3s603ms, non-child: 0ns, % non-child: 0.00%)
        MemoryUsage(500.0ms): 0, 3.14 MB, 3.14 MB, 3.14 MB, 3.14 MB, 3.15 MB, 3.15 MB, 3.16 MB
        ThreadUsage(500.0ms): 1, 1, 1, 1, 1, 1, 1, 1
         - AverageThreadTokens: 1.00 
         - PeakMemoryUsage: 8.41 MB (8818688)
         - PerHostPeakMemUsage: 8.41 MB (8818688)
         - PrepareTime: 270.423ms
         - RowsProduced: 2 (2)
         - TotalCpuTime: 272.476ms
         - TotalNetworkReceiveTime: 3s331ms
         - TotalNetworkSendTime: 565.845us
         - TotalStorageWaitTime: 0ns
        BlockMgr:
           - BlockWritesOutstanding: 0 (0)
           - BlocksCreated: 96 (96)
           - BlocksRecycled: 1 (1)
           - BufferedPins: 0 (0)
           - BytesWritten: 0
           - MaxBlockSize: 8.00 MB (8388608)
           - MemoryLimit: 38.40 GB (41231687680)
           - PeakMemoryUsage: 8.02 MB (8404992)
           - TotalBufferWaitTime: 0ns
           - TotalEncryptionTime: 0ns
           - TotalIntegrityCheckTime: 0ns
           - TotalReadBlockTime: 0ns
        CodeGen:(Total: 269.840ms, non-child: 269.840ms, % non-child: 100.00%)
           - CodegenTime: 423.130us
           - CompileTime: 250ns
           - LoadTime: 121.419us
           - ModuleFileSize: 2.23 MB (2333204)
           - OptimizationTime: 0ns
           - PrepareTime: 268.433ms
        DataStreamSender (dst_id=5):(Total: 43.283us, non-child: 43.283us, % non-child: 100.00%)
           - BytesSent: 50.00 B (50)
           - NetworkThroughput(*): 110.36 KB/sec
           - OverallThroughput: 1.10 MB/sec
           - PeakMemoryUsage: 4.00 KB (4096)
           - SerializeBatchTime: 17.90us
           - ThriftTransmitTime(*): 442.430us
           - UncompressedRowBatchSize: 76.00 B (76)
        SORT_NODE (id=2):(Total: 3s602ms, non-child: 473.974us, % non-child: 0.01%)
           - InMemorySortTime: 3.235us
           - InitialRunsCreated: 1 (1)
           - PeakMemoryUsage: 8.02 MB (8404992)
           - RowsReturned: 2 (2)
           - RowsReturnedRate: 0
           - SortDataSize: 64.00 B (64)
           - TotalMergesPerformed: 0 (0)
        AGGREGATION_NODE (id=4):(Total: 3s602ms, non-child: 270.772ms, % non-child: 7.52%)
           - BuildTime: 24.478us
           - GetNewBlockTime: 38.840us
           - GetResultsTime: 3.61us
           - HashBuckets: 2.05K (2048)
           - LargestPartitionPercent: 12 (12)
           - MaxPartitionLevel: 0 (0)
           - NumRepartitions: 0 (0)
           - PartitionsCreated: 16 (16)
           - PeakMemoryUsage: 3.15 MB (3301376)
           - PinTime: 0ns
           - RowsRepartitioned: 0 (0)
           - RowsReturned: 2 (2)
           - RowsReturnedRate: 0
           - SpilledPartitions: 0 (0)
           - UnpinTime: 874ns
        EXCHANGE_NODE (id=3):(Total: 3s331ms, non-child: 85.894us, % non-child: 0.00%)
          BytesReceived(500.0ms): 0, 0, 0, 0, 52.00 B, 52.00 B, 156.00 B
           - BytesReceived: 208.00 B (208)
           - ConvertRowBatchTime: 11.584us
           - DeserializeRowBatchTimer: 34.881us
           - FirstBatchArrivalWaitTime: 2s117ms
           - PeakMemoryUsage: 0
           - RowsReturned: 8 (8)
           - RowsReturnedRate: 2.00 /sec
           - SendersBlockedTimer: 0ns
           - SendersBlockedTotalTimer(*): 0ns
    Fragment F00:
      Instance a7442ed1a43b8e8d:8ad38fb17f29baa8 (host=worker1.some.domain.net:22000):(Total: 3s327ms, non-child: 0ns, % non-child: 0.00%)
        Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:8/1.92 GB 
        MemoryUsage(500.0ms): 4.73 MB, 4.76 MB, 4.77 MB, 4.76 MB, 4.76 MB, 4.73 MB, 4.23 MB

 The new cluster:

 

+---------------------+----------+
| part_col            | count(*) |
+---------------------+----------+
| 2017-01-01 00:00:00 | 18428899 |
| 2017-01-02 00:00:00 | 20974112 |
| 2017-01-03 00:00:00 | 13178213 |
| 2017-01-04 00:00:00 | 26768773 |
| 2017-01-05 00:00:00 | 24148741 |
+---------------------+----------+
Query Runtime Profile:
Query (id=814dc21ba8355459:ac7e36724f454893):
  Summary:
    Session ID: 694ced478c51b5e5:321e569a7a0513b5
    Session Type: BEESWAX
    Start Time: 2017-01-31 09:08:07.217250000
    End Time: 2017-01-31 09:08:12.161790000
    Query Type: QUERY
    Query State: FINISHED
    Query Status: OK
    Impala Version: impalad version 2.6.0-cdh5.8.3 RELEASE (build c644f476b774db9db87a619628f7a6ecc5f843e0)
    User: test_user@SOME.DOMAIN.NET
    Connected User: test_user@SOME.DOMAIN.NET
    Delegated User: 
    Network Address: ::ffff:xx.xx.xx.xx:55766
    Default Db: default
    Sql Statement: select part_col, count(*) from perf_test group by part_col order by 1
    Coordinator: worker3.some.domain.net:22000
    Query Options (non default): 
    Plan: 
----------------
Estimated Per-Host Requirements: Memory=344.00MB VCores=2
WARNING: The following tables are missing relevant table and/or column statistics.
default.perf_test

05:MERGING-EXCHANGE [UNPARTITIONED]
|  order by: part_col ASC
|  hosts=5 per-host-mem=unavailable
|  tuple-ids=2 row-size=24B cardinality=unavailable
|
02:SORT
|  order by: part_col ASC
|  hosts=5 per-host-mem=0B
|  tuple-ids=2 row-size=24B cardinality=unavailable
|
04:AGGREGATE [FINALIZE]
|  output: count:merge(*)
|  group by: part_col
|  hosts=5 per-host-mem=128.00MB
|  tuple-ids=1 row-size=24B cardinality=unavailable
|
03:EXCHANGE [HASH(part_col)]
|  hosts=5 per-host-mem=0B
|  tuple-ids=1 row-size=24B cardinality=unavailable
|
01:AGGREGATE [STREAMING]
|  output: count(*)
|  group by: part_col
|  hosts=5 per-host-mem=128.00MB
|  tuple-ids=1 row-size=24B cardinality=unavailable
|
00:SCAN HDFS [default.perf_test, RANDOM]
   partitions=1/1 files=32 size=7.22GB
   table stats: unavailable
   column stats: unavailable
   hosts=5 per-host-mem=88.00MB
   tuple-ids=0 row-size=16B cardinality=unavailable
----------------
    Estimated Per-Host Mem: 360710144
    Estimated Per-Host VCores: 2
    Tables Missing Stats: default.perf_test
    Request Pool: root.test_user
    Admission result: Admitted immediately
    ExecSummary: 
Operator              #Hosts   Avg Time   Max Time    #Rows  Est. #Rows  Peak Mem  Est. Peak Mem  Detail             
---------------------------------------------------------------------------------------------------------------------
05:MERGING-EXCHANGE        1   82.553us   82.553us        5          -1         0        -1.00 B  UNPARTITIONED      
02:SORT                    5  434.217us  491.025us        5          -1   8.02 MB              0                     
04:AGGREGATE               5  279.461ms  316.351ms        5          -1   2.30 MB      128.00 MB  FINALIZE           
03:EXCHANGE                5   28.378us   55.658us       25          -1         0              0  HASH(part_col) 
01:AGGREGATE               5    3s965ms    4s739ms       25          -1   1.59 MB      128.00 MB  STREAMING          
00:SCAN HDFS               5   83.393ms  118.345ms  103.50M          -1  12.04 MB       88.00 MB  default.perf_test    
    Planner Timeline: 23.087ms
       - Analysis finished: 10.147ms (10.147ms)
       - Equivalence classes computed: 10.654ms (506.930us)
       - Single node plan created: 14.745ms (4.091ms)
       - Runtime filters computed: 14.787ms (41.721us)
       - Distributed plan created: 17.268ms (2.481ms)
       - Lineage info computed: 18.728ms (1.460ms)
       - Planning finished: 23.087ms (4.358ms)
    Query Timeline: 4s949ms
       - Start execution: 56.064us (56.064us)
       - Planning finished: 26.811ms (26.755ms)
       - Submit for admission: 27.851ms (1.039ms)
       - Completed admission: 27.929ms (77.878us)
       - Ready to start 10 remote fragments: 28.364ms (434.946us)
       - All 10 remote fragments started: 29.638ms (1.273ms)
       - Rows available: 4s860ms (4s831ms)
       - First row fetched: 4s942ms (81.424ms)
       - Unregister query: 4s944ms (2.126ms)
  ImpalaServer:
     - ClientFetchWaitTimer: 82.476ms
     - RowMaterializationTimer: 51.064us
  Execution Profile 814dc21ba8355459:ac7e36724f454893:(Total: 4s833ms, non-child: 0.000ns, % non-child: 0.00%)
    Number of filters: 0
    Filter routing table: 
 ID  Src. Node  Tgt. Node(s)  Targets  Target type  Partition filter  Pending (Expected)  First arrived   Completed
-------------------------------------------------------------------------------------------------------------------

    Fragment start latencies: Count: 10, 25th %-ile: 1ms, 50th %-ile: 1ms, 75th %-ile: 1ms, 90th %-ile: 1ms, 95th %-ile: 1ms, 99.9th %-ile: 1ms
    Per Node Peak Memory Usage: worker2.some.domain.net:22000(23.93 MB) worker5.some.domain.net:22000(23.32 MB) worker1.some.domain.net:22000(23.85 MB) worker4.some.domain.net:22000(23.94 MB) worker3.some.domain.net:22000(23.86 MB) 
     - FiltersReceived: 0 (0)
     - FinalizationTimer: 0.000ns
    Coordinator Fragment F02:(Total: 4s831ms, non-child: 255.854us, % non-child: 0.01%)
      MemoryUsage(500.000ms): 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB
       - AverageThreadTokens: 0.00 
       - BloomFilterBytes: 0
       - PeakMemoryUsage: 20.04 KB (20520)
       - PerHostPeakMemUsage: 0
       - PrepareTime: 45.494us
       - RowsProduced: 5 (5)
       - TotalCpuTime: 4s914ms
       - TotalNetworkReceiveTime: 0.000ns
       - TotalNetworkSendTime: 0.000ns
       - TotalStorageWaitTime: 0.000ns
      BlockMgr:
         - BlockWritesOutstanding: 0 (0)
         - BlocksCreated: 49 (49)
         - BlocksRecycled: 0 (0)
         - BufferedPins: 0 (0)
         - BytesWritten: 0
         - MaxBlockSize: 8.00 MB (8388608)
         - MemoryLimit: 61.98 GB (66552700928)
         - PeakMemoryUsage: 8.77 MB (9191424)
         - TotalBufferWaitTime: 0.000ns
         - TotalEncryptionTime: 0.000ns
         - TotalIntegrityCheckTime: 0.000ns
         - TotalReadBlockTime: 0.000ns
      EXCHANGE_NODE (id=5):(Total: 4s831ms, non-child: 82.553us, % non-child: 0.00%)
        BytesReceived(500.000ms): 0, 0, 0, 0, 0, 0, 0, 0, 0
         - BytesReceived: 134.00 B (134)
         - ConvertRowBatchTime: 0.000ns
         - DeserializeRowBatchTimer: 31.107us
         - FirstBatchArrivalWaitTime: 4s830ms
         - MergeGetNext: 25.594us
         - MergeGetNextBatch: 15.618us
         - PeakMemoryUsage: 0
         - RowsReturned: 5 (5)
         - RowsReturnedRate: 1.00 /sec
         - SendersBlockedTimer: 0.000ns
         - SendersBlockedTotalTimer(*): 0.000ns
    Averaged Fragment F01:(Total: 4s721ms, non-child: 0.000ns, % non-child: 0.00%)
      split sizes:  min: 0, max: 0, avg: 0, stddev: 0
      completion times: min:4s831ms  max:4s832ms  mean: 4s832ms  stddev:302.637us
      execution rates: min:0.00 /sec  max:0.00 /sec  mean:0.00 /sec  stddev:0.00 /sec
      num instances: 5
       - AverageThreadTokens: 1.00 
       - BloomFilterBytes: 0
       - PeakMemoryUsage: 10.30 MB (10802320)
       - PerHostPeakMemUsage: 23.78 MB (24934251)
       - PrepareTime: 278.764ms
       - RowsProduced: 1 (1)
       - TotalCpuTime: 390.687ms
       - TotalNetworkReceiveTime: 4s440ms
       - TotalNetworkSendTime: 107.306us
       - TotalStorageWaitTime: 0.000ns
      BlockMgr:
         - BlockWritesOutstanding: 0 (0)
         - BlocksCreated: 49 (49)
         - BlocksRecycled: 0 (0)
         - BufferedPins: 0 (0)
         - BytesWritten: 0
         - MaxBlockSize: 8.00 MB (8388608)
         - MemoryLimit: 61.98 GB (66552700928)
         - PeakMemoryUsage: 8.77 MB (9191424)
         - TotalBufferWaitTime: 0.000ns
         - TotalEncryptionTime: 0.000ns
         - TotalIntegrityCheckTime: 0.000ns
         - TotalReadBlockTime: 0.000ns
      CodeGen:(Total: 381.581ms, non-child: 381.581ms, % non-child: 100.00%)
         - CodegenTime: 6.404ms
         - CompileTime: 7.476ms
         - LoadTime: 0.000ns
         - ModuleBitcodeSize: 2.15 MB (2254856)
         - NumFunctions: 29 (29)
         - NumInstructions: 550 (550)
         - OptimizationTime: 102.403ms
         - PrepareTime: 270.719ms
      DataStreamSender (dst_id=5):(Total: 68.002us, non-child: 68.002us, % non-child: 100.00%)
         - BytesSent: 26.00 B (26)
         - NetworkThroughput(*): 787.80 KB/sec
         - OverallThroughput: 295.48 KB/sec
         - PeakMemoryUsage: 4.00 KB (4096)
         - RowsReturned: 1 (1)
         - SerializeBatchTime: 19.405us
         - TransmitDataRPCTime: 21.326us
         - UncompressedRowBatchSize: 38.00 B (38)
      SORT_NODE (id=2):(Total: 4s720ms, non-child: 434.218us, % non-child: 0.01%)
         - InMemorySortTime: 1.484us
         - InitialRunsCreated: 1 (1)
         - PeakMemoryUsage: 8.01 MB (8401715)
         - RowsReturned: 1 (1)
         - RowsReturnedRate: 0
         - SortDataSize: 32.00 B (32)
         - SpilledRuns: 0 (0)
         - TotalMergesPerformed: 0 (0)
      AGGREGATION_NODE (id=4):(Total: 4s720ms, non-child: 279.461ms, % non-child: 5.92%)
         - BuildTime: 13.858us
         - GetNewBlockTime: 145.320us
         - GetResultsTime: 1.887us
         - HTResizeTime: 3.177us
         - HashBuckets: 1.02K (1024)
         - LargestPartitionPercent: 8 (8)
         - MaxPartitionLevel: 0 (0)
         - NumRepartitions: 0 (0)
         - PartitionsCreated: 16 (16)
         - PeakMemoryUsage: 2.29 MB (2401408)
         - PinTime: 0.000ns
         - RowsRepartitioned: 0 (0)
         - RowsReturned: 1 (1)
         - RowsReturnedRate: 0
         - SpilledPartitions: 0 (0)
         - UnpinTime: 1.046us
      EXCHANGE_NODE (id=3):(Total: 4s440ms, non-child: 4s440ms, % non-child: 100.00%)
         - BytesReceived: 133.00 B (133)
         - ConvertRowBatchTime: 4.099us
         - DeserializeRowBatchTimer: 27.477us
         - FirstBatchArrivalWaitTime: 2s625ms
         - PeakMemoryUsage: 0
         - RowsReturned: 5 (5)
         - RowsReturnedRate: 1.00 /sec
         - SendersBlockedTimer: 0.000ns
         - SendersBlockedTotalTimer(*): 0.000ns
    Averaged Fragment F00:(Total: 4s051ms, non-child: 0.000ns, % non-child: 0.00%)
      split sizes:  min: 1.33 GB, max: 1.54 GB, avg: 1.44 GB, stddev: 78.16 MB
      completion times: min:1s806ms  max:4s831ms  mean: 4s052ms  stddev:1s134ms
      execution rates: min:317.74 MB/sec  max:756.58 MB/sec  mean:412.43 MB/sec  stddev:172.21 MB/sec
      num instances: 5
       - AverageThreadTokens: 6.25 
       - BloomFilterBytes: 0
       - PeakMemoryUsage: 13.49 MB (14146688)
       - PerHostPeakMemUsage: 23.78 MB (24934251)
       - PrepareTime: 281.243ms
       - RowsProduced: 5 (5)
       - TotalCpuTime: 26s501ms
       - TotalNetworkReceiveTime: 0.000ns
       - TotalNetworkSendTime: 506.296us
       - TotalStorageWaitTime: 136.659ms
      CodeGen:(Total: 274.654ms, non-child: 274.654ms, % non-child: 100.00%)
         - CodegenTime: 6.016ms
         - CompileTime: 0.000ns
         - LoadTime: 0.000ns
         - ModuleBitcodeSize: 2.15 MB (2254856)
         - NumFunctions: 0 (0)
         - NumInstructions: 0 (0)
         - OptimizationTime: 0.000ns
         - PrepareTime: 273.688ms
      DataStreamSender (dst_id=3):(Total: 147.360us, non-child: 147.360us, % non-child: 100.00%)
         - BytesSent: 133.00 B (133)
         - NetworkThroughput(*): 1.84 MB/sec
         - OverallThroughput: 885.74 KB/sec
         - PeakMemoryUsage: 32.00 KB (32768)
         - RowsReturned: 5 (5)
         - SerializeBatchTime: 48.621us
         - TransmitDataRPCTime: 73.238us
         - UncompressedRowBatchSize: 192.00 B (192)
      AGGREGATION_NODE (id=1):(Total: 4s049ms, non-child: 3s965ms, % non-child: 97.94%)
         - GetNewBlockTime: 78.103us
         - GetResultsTime: 9.538us
         - HTResizeTime: 523.269us
         - HashBuckets: 8.19K (8192)
         - LargestPartitionPercent: 0 (0)
         - PartitionsCreated: 16 (16)
         - PeakMemoryUsage: 1.59 MB (1668224)
         - PinTime: 0.000ns
         - ReductionFactorEstimate: 0.00 
         - ReductionFactorThresholdToExpand: 0.00 
         - RowsPassedThrough: 0 (0)
         - RowsReturned: 5 (5)
         - RowsReturnedRate: 1.00 /sec
         - StreamingTime: 3s666ms
         - UnpinTime: 0.000ns
      HDFS_SCAN_NODE (id=0):(Total: 83.393ms, non-child: 83.393ms, % non-child: 100.00%)
         - AverageHdfsReadThreadConcurrency: 0.00 
         - AverageScannerThreadConcurrency: 5.63 
         - BytesRead: 650.78 KB (666403)
         - BytesReadDataNodeCache: 0
         - BytesReadLocal: 650.78 KB (666403)
         - BytesReadRemoteUnexpected: 0
         - BytesReadShortCircuit: 650.78 KB (666403)
         - DecompressionTime: 345.570us
         - MaxCompressedTextFileLength: 0
         - NumColumns: 1 (1)
         - NumDisksAccessed: 6 (6)
         - NumRowGroups: 6 (6)
         - NumScannerThreadsStarted: 6 (6)
         - PeakMemoryUsage: 11.89 MB (12465356)
         - PerReadThreadRawHdfsThroughput: 686.45 MB/sec
         - RemoteScanRanges: 0 (0)
         - RowsRead: 20.70M (20699747)
         - RowsReturned: 20.70M (20699747)
         - RowsReturnedRate: 282.05 M/sec
         - ScanRangesComplete: 6 (6)
         - ScannerThreadsInvoluntaryContextSwitches: 12 (12)
         - ScannerThreadsTotalWallClockTime: 22s586ms
           - MaterializeTupleTime(*): 22s440ms
           - ScannerThreadsSysTime: 32.800ms
           - ScannerThreadsUserTime: 666.440ms
         - ScannerThreadsVoluntaryContextSwitches: 20.07K (20074)
         - TotalRawHdfsReadTime(*): 1.368ms
         - TotalReadThroughput: 181.83 KB/sec
    Fragment F01:
      Instance 814dc21ba8355459:ac7e36724f454896 (host=worker1.some.domain.net:22000):(Total: 4s773ms, non-child: 0.000ns, % non-child: 0.00%)
        MemoryUsage(500.000ms): 10.28 MB, 10.28 MB, 10.28 MB, 10.28 MB, 10.29 MB, 10.29 MB, 10.29 MB, 10.29 MB, 10.29 MB, 10.31 MB
        ThreadUsage(500.000ms): 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
         - AverageThreadTokens: 1.00 
         - BloomFilterBytes: 0
         - PeakMemoryUsage: 10.31 MB (10810528)
         - PerHostPeakMemUsage: 23.85 MB (25013504)
         - PrepareTime: 140.565ms
         - RowsProduced: 2 (2)
         - TotalCpuTime: 198.707ms
         - TotalNetworkReceiveTime: 4s632ms
         - TotalNetworkSendTime: 144.042us
         - TotalStorageWaitTime: 0.000ns
        BlockMgr:
           - BlockWritesOutstanding: 0 (0)
           - BlocksCreated: 49 (49)
           - BlocksRecycled: 0 (0)
           - BufferedPins: 0 (0)
           - BytesWritten: 0
           - MaxBlockSize: 8.00 MB (8388608)
           - MemoryLimit: 61.98 GB (66552700928)
           - PeakMemoryUsage: 8.77 MB (9191424)
           - TotalBufferWaitTime: 0.000ns
           - TotalEncryptionTime: 0.000ns
           - TotalIntegrityCheckTime: 0.000ns
           - TotalReadBlockTime: 0.000ns
        CodeGen:(Total: 193.769ms, non-child: 193.769ms, % non-child: 100.00%)
           - CodegenTime: 3.318ms
           - CompileTime: 3.802ms
           - LoadTime: 0.000ns
           - ModuleBitcodeSize: 2.15 MB (2254856)
           - NumFunctions: 29 (29)
           - NumInstructions: 550 (550)
           - OptimizationTime: 53.125ms
           - PrepareTime: 136.382ms
        DataStreamSender (dst_id=5):(Total: 119.588us, non-child: 119.588us, % non-child: 100.00%)
           - BytesSent: 50.00 B (50)
           - NetworkThroughput(*): 1.84 MB/sec
           - OverallThroughput: 408.30 KB/sec
           - PeakMemoryUsage: 4.00 KB (4096)
           - RowsReturned: 2 (2)
           - SerializeBatchTime: 73.063us
           - TransmitDataRPCTime: 25.979us
           - UncompressedRowBatchSize: 76.00 B (76)
        SORT_NODE (id=2):(Total: 4s773ms, non-child: 271.839us, % non-child: 0.01%)
          ExecOption: Codegen Enabled
           - InMemorySortTime: 1.324us
           - InitialRunsCreated: 1 (1)
           - PeakMemoryUsage: 8.02 MB (8404992)
           - RowsReturned: 2 (2)
           - RowsReturnedRate: 0
           - SortDataSize: 64.00 B (64)
           - SpilledRuns: 0 (0)
           - TotalMergesPerformed: 0 (0)
        AGGREGATION_NODE (id=4):(Total: 4s773ms, non-child: 140.842ms, % non-child: 2.95%)
          ExecOption: Codegen Disabled: HashTableCtx::CodegenEvalRow(): type TIMESTAMP NYI

           - BuildTime: 12.344us
           - GetNewBlockTime: 87.985us
           - GetResultsTime: 1.797us
           - HTResizeTime: 1.646us
           - HashBuckets: 2.05K (2048)
           - LargestPartitionPercent: 10 (10)
           - MaxPartitionLevel: 0 (0)
           - NumRepartitions: 0 (0)
           - PartitionsCreated: 16 (16)
           - PeakMemoryUsage: 2.30 MB (2409600)
           - PinTime: 0.000ns
           - RowsRepartitioned: 0 (0)
           - RowsReturned: 2 (2)
           - RowsReturnedRate: 0
           - SpilledPartitions: 0 (0)
           - UnpinTime: 616.000ns
        EXCHANGE_NODE (id=3):(Total: 4s632ms, non-child: 28.353us, % non-child: 0.00%)
          BytesReceived(500.000ms): 0, 0, 0, 0, 52.00 B, 52.00 B, 52.00 B, 52.00 B, 52.00 B, 208.00 B
           - BytesReceived: 260.00 B (260)
           - ConvertRowBatchTime: 4.176us
           - DeserializeRowBatchTimer: 25.676us
           - FirstBatchArrivalWaitTime: 1s607ms
           - PeakMemoryUsage: 0
           - RowsReturned: 10 (10)
           - RowsReturnedRate: 2.00 /sec
           - SendersBlockedTimer: 0.000ns
           - SendersBlockedTotalTimer(*): 0.000ns
      Instance 814dc21ba8355459:ac7e36724f454897 (host=worker5.some.domain.net:22000):(Total: 4s709ms, non-child: 0.000ns, % non-child: 0.00%)
        MemoryUsage(500.000ms): 0, 10.28 MB, 10.28 MB, 10.28 MB, 10.29 MB, 10.29 MB, 10.29 MB, 10.29 MB, 10.29 MB, 10.30 MB
        ThreadUsage(500.000ms): 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
         - AverageThreadTokens: 1.00 
         - BloomFilterBytes: 0
         - PeakMemoryUsage: 10.31 MB (10810528)
         - PerHostPeakMemUsage: 23.32 MB (24449296)
         - PrepareTime: 311.018ms
         - RowsProduced: 2 (2)
         - TotalCpuTime: 435.142ms
         - TotalNetworkReceiveTime: 4s396ms
         - TotalNetworkSendTime: 187.827us
         - TotalStorageWaitTime: 0.000ns
        BlockMgr:
           - BlockWritesOutstanding: 0 (0)
           - BlocksCreated: 49 (49)
           - BlocksRecycled: 0 (0)
           - BufferedPins: 0 (0)
           - BytesWritten: 0
           - MaxBlockSize: 8.00 MB (8388608)
           - MemoryLimit: 61.98 GB (66552700928)
           - PeakMemoryUsage: 8.77 MB (9191424)
           - TotalBufferWaitTime: 0.000ns
           - TotalEncryptionTime: 0.000ns
           - TotalIntegrityCheckTime: 0.000ns
           - TotalReadBlockTime: 0.000ns
        CodeGen:(Total: 425.412ms, non-child: 425.412ms, % non-child: 100.00%)
           - CodegenTime: 6.872ms
           - CompileTime: 8.123ms
           - LoadTime: 0.000ns
           - ModuleBitcodeSize: 2.15 MB (2254856)
           - NumFunctions: 29 (29)
           - NumInstructions: 550 (550)
           - OptimizationTime: 113.870ms
           - PrepareTime: 302.324ms
        DataStreamSender (dst_id=5):(Total: 69.893us, non-child: 69.893us, % non-child: 100.00%)
           - BytesSent: 53.00 B (53)
           - NetworkThroughput(*): 1.31 MB/sec
           - OverallThroughput: 740.53 KB/sec
           - PeakMemoryUsage: 4.00 KB (4096)
           - RowsReturned: 2 (2)
           - SerializeBatchTime: 11.380us
           - TransmitDataRPCTime: 38.673us
           - UncompressedRowBatchSize: 76.00 B (76)
        SORT_NODE (id=2):(Total: 4s708ms, non-child: 491.025us, % non-child: 0.01%)
          ExecOption: Codegen Enabled
           - InMemorySortTime: 2.603us
           - InitialRunsCreated: 1 (1)
           - PeakMemoryUsage: 8.02 MB (8404992)
           - RowsReturned: 2 (2)
           - RowsReturnedRate: 0
           - SortDataSize: 64.00 B (64)
           - SpilledRuns: 0 (0)
           - TotalMergesPerformed: 0 (0)
        AGGREGATION_NODE (id=4):(Total: 4s708ms, non-child: 311.656ms, % non-child: 6.62%)
          ExecOption: Codegen Disabled: HashTableCtx::CodegenEvalRow(): type TIMESTAMP NYI

           - BuildTime: 21.607us
           - GetNewBlockTime: 144.739us
           - GetResultsTime: 5.404us
           - HTResizeTime: 3.560us
           - HashBuckets: 2.05K (2048)
           - LargestPartitionPercent: 10 (10)
           - MaxPartitionLevel: 0 (0)
           - NumRepartitions: 0 (0)
           - PartitionsCreated: 16 (16)
           - PeakMemoryUsage: 2.30 MB (2409600)
           - PinTime: 0.000ns
           - RowsRepartitioned: 0 (0)
           - RowsReturned: 2 (2)
           - RowsReturnedRate: 0
           - SpilledPartitions: 0 (0)
           - UnpinTime: 1.295us
        EXCHANGE_NODE (id=3):(Total: 4s396ms, non-child: 42.905us, % non-child: 0.00%)
          BytesReceived(500.000ms): 0, 0, 0, 51.00 B, 51.00 B, 51.00 B, 51.00 B, 51.00 B, 154.00 B
           - BytesReceived: 258.00 B (258)
           - ConvertRowBatchTime: 6.460us
           - DeserializeRowBatchTimer: 67.686us
           - FirstBatchArrivalWaitTime: 1s371ms
           - PeakMemoryUsage: 0
           - RowsReturned: 10 (10)
           - RowsReturnedRate: 2.00 /sec
           - SendersBlockedTimer: 0.000ns
           - SendersBlockedTotalTimer(*): 0.000ns

 

 

 

 

avatar

It's not entirely obvious to me what changed. Since it's grouping by timestamp, you're running into https://issues.cloudera.org/browse/IMPALA-3884, which is forcing it to execute the aggregation in an interpreted mode. I suspect that the performance of that interpreted code path may have changed a bit - we tend to focus most of our tuning on the codegened code path.

 

On releases with IMPALA-3884 fixed it should be a *lot* faster.

 

"          ExecOption: Codegen Disabled: HashTableCtx::CodegenEvalRow(): type TIMESTAMP NYI"

avatar

It looks like the problem is really in the timestamp field. Running a similar query on table without timestamp show much better results on the new environment. Thanks for the help