Support Questions

Find answers, ask questions, and share your expertise

On a 3 billions row table "SELECT * WHERE row not exist" took long time

avatar
New Contributor

Hello,

 

I've got a 4 data node cluster + 1 master node. I'm trying to understand the better way to retrieve all cols data from a query. The cluster run CDH 5.7.

 

For my tests, I've got a 3 billions row table that contain 210 columns. This table is stored as parquet.

 

1 - If I run the query that return no results, I use a condition in where clause.

 

 

SELECT * FROM 3_billion_row_table WHERE colA = "not exists" 

 

 

It take 402 seconds (that's very long)

 

2- If I execute the same query, but I specify only one row

 

SELECT colB FROM 3_billion_row_table WHERE colA = "not exists" 

It take 4 seconds (100x faster).

 

I run the profile after the query with the select *, you can find the content below :

Query Runtime Profile:
Query (id=f4157aa989f6fe7:6dfb77323ae25d99):
Summary:
Session ID: 114a2324e93193fe:f773302cc4071f89
Session Type: BEESWAX
Start Time: 2016-05-17 16:00:25.193344000
End Time: 2016-05-17 16:07:29.957456000
Query Type: QUERY
Query State: FINISHED
Query Status: OK
Impala Version: impalad version 2.5.0-cdh5.7.0 RELEASE (build ad3f5adabedf56fe6bd9eea39147c067cc552703)
User: cloudera
Connected User: cloudera
Delegated User:
Network Address: ::ffff:192.168.0.1:45628
Default Db: default
Sql Statement: select * FROM 3_billion_row_table WHERE colA = "not exists"
Coordinator: hd03:22000
Query Options (non default):
Plan:
----------------
Estimated Per-Host Requirements: Memory=7.50GB VCores=1

01:EXCHANGE [UNPARTITIONED]
| hosts=4 per-host-mem=unavailable
| tuple-ids=0 row-size=1.55KB cardinality=1528
|
00:SCAN HDFS [default.3_billion_row_table, RANDOM]
partitions=922/922 files=1444 size=223.80GB
predicates: colA = 'not exists'
table stats: 2835171843 rows total
column stats: all
hosts=4 per-host-mem=7.50GB
tuple-ids=0 row-size=1.55KB cardinality=1528
----------------
Estimated Per-Host Mem: 8053063680
Estimated Per-Host VCores: 1
Request Pool: root.cloudera
Admission result: Admitted immediately
ExecSummary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
----------------------------------------------------------------------------------------------------------------------
01:EXCHANGE 1 10.269us 10.269us 0 1.53K 0 -1.00 B UNPARTITIONED
00:SCAN HDFS 4 6m27s 6m59s 0 1.53K 25.51 GB 7.50 GB default.3_billion_row_table
Planner Timeline: 25.504ms
- Analysis finished: 5.369ms (5.369ms)
- Equivalence classes computed: 6.977ms (1.608ms)
- Single node plan created: 12.044ms (5.066ms)
- Runtime filters computed: 12.127ms (82.970us)
- Distributed plan created: 14.942ms (2.815ms)
- Lineage info computed: 19.013ms (4.070ms)
- Planning finished: 25.504ms (6.491ms)
Query Timeline: 7m4s
- Start execution: 48.954us (48.954us)
- Planning finished: 71.823ms (71.774ms)
- Submit for admission: 80.518ms (8.695ms)
- Completed admission: 80.594ms (75.671us)
- Ready to start 4 remote fragments: 89.034ms (8.439ms)
- All 4 remote fragments started: 130.870ms (41.835ms)
- Rows available: 7m4s (7m3s)
- First row fetched: 7m4s (748.943ms)
- Unregister query: 7m4s (2.014ms)
ImpalaServer:
- ClientFetchWaitTimer: 750.415ms
- RowMaterializationTimer: 0.000ns
Execution Profile f4157aa989f6fe7:6dfb77323ae25d99:(Total: 7m3s, non-child: 0.000ns, % non-child: 0.00%)
Number of filters: 0
Filter routing table:
ID Src. Node Tgt. Node Targets Type Partition filter
-----------------------------------------------------------

Fragment start latencies: Count: 4, 25th %-ile: 16ms, 50th %-ile: 20ms, 75th %-ile: 20ms, 90th %-ile: 21ms, 95th %-ile: 21ms, 99.9th %-ile: 21ms
Per Node Peak Memory Usage: hd02:22000(12.73 GB) hd04:22000(25.00 GB) hd05:22000(25.51 GB) hd03:22000(12.79 GB)
- FiltersReceived: 0 (0)
- FinalizationTimer: 0.000ns
Coordinator Fragment F01:(Total: 7m3s, non-child: 1.278ms, % non-child: 0.00%)
MemoryUsage(8s000ms): 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB
- AverageThreadTokens: 0.00
- BloomFilterBytes: 0
- PeakMemoryUsage: 8.00 KB (8192)
- PerHostPeakMemUsage: 0
- PrepareTime: 28.951us
- RowsProduced: 0 (0)
- TotalCpuTime: 797.037ms
- TotalNetworkReceiveTime: 7m3s
- TotalNetworkSendTime: 0.000ns
- TotalStorageWaitTime: 0.000ns
BlockMgr:
- BlockWritesOutstanding: 0 (0)
- BlocksCreated: 0 (0)
- BlocksRecycled: 0 (0)
- BufferedPins: 0 (0)
- BytesWritten: 0
- MaxBlockSize: 8.00 MB (8388608)
- MemoryLimit: 6871947776.00 GB (7378697739434983424)
- PeakMemoryUsage: 0
- TotalBufferWaitTime: 0.000ns
- TotalEncryptionTime: 0.000ns
- TotalIntegrityCheckTime: 0.000ns
- TotalReadBlockTime: 0.000ns
EXCHANGE_NODE (id=1):(Total: 7m3s, non-child: 10.269us, % non-child: 0.00%)
BytesReceived(8s000ms): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
- BytesReceived: 0
- ConvertRowBatchTime: 1.941us
- DeserializeRowBatchTimer: 0.000ns
- FirstBatchArrivalWaitTime: 7m3s
- PeakMemoryUsage: 0
- RowsReturned: 0 (0)
- RowsReturnedRate: 0
- SendersBlockedTimer: 0.000ns
- SendersBlockedTotalTimer(*): 0.000ns
Averaged Fragment F00:(Total: 6m27s, non-child: 0.000ns, % non-child: 0.00%)
split sizes: min: 55.73 GB, max: 56.05 GB, avg: 55.95 GB, stddev: 133.54 MB
completion times: min:6m12s max:7m3s mean: 6m34s stddev:20s771ms
execution rates: min:135.26 MB/sec max:153.22 MB/sec mean:145.68 MB/sec stddev:7.38 MB/sec
num instances: 4
- AverageThreadTokens: 67.38
- BloomFilterBytes: 0
- PeakMemoryUsage: 19.00 GB (20406256910)
- PerHostPeakMemUsage: 19.00 GB (20406258958)
- PrepareTime: 195.838ms
- RowsProduced: 0 (0)
- TotalCpuTime: 7h9m
- TotalNetworkReceiveTime: 0.000ns
- TotalNetworkSendTime: 234.000ns
- TotalStorageWaitTime: 11m45s
CodeGen:(Total: 421.436ms, non-child: 421.436ms, % non-child: 100.00%)
- CodegenTime: 0.000ns
- CompileTime: 1.582ms
- LoadTime: 0.000ns
- ModuleBitcodeSize: 1.85 MB (1940220)
- OptimizationTime: 227.128ms
- PrepareTime: 191.302ms
DataStreamSender (dst_id=1):(Total: 16.999us, non-child: 16.999us, % non-child: 100.00%)
- BytesSent: 0
- NetworkThroughput(*): 0.00 /sec
- OverallThroughput: 0.00 /sec
- PeakMemoryUsage: 88.00 B (88)
- SerializeBatchTime: 0.000ns
- ThriftTransmitTime(*): 0.000ns
- UncompressedRowBatchSize: 0
HDFS_SCAN_NODE (id=0):(Total: 6m27s, non-child: 6m27s, % non-child: 100.00%)
- AverageHdfsReadThreadConcurrency: 0.67
- AverageScannerThreadConcurrency: 66.46
- BytesRead: 55.97 GB (60099569226)
- BytesReadDataNodeCache: 0
- BytesReadLocal: 55.97 GB (60099569226)
- BytesReadRemoteUnexpected: 0
- BytesReadShortCircuit: 55.97 GB (60099569226)
- DecompressionTime: 12m35s
- MaxCompressedTextFileLength: 0
- NumColumns: 206 (206)
- NumDisksAccessed: 11 (11)
- NumRowGroups: 361 (361)
- NumScannerThreadsStarted: 71 (71)
- PeakMemoryUsage: 19.00 GB (20406216998)
- PerReadThreadRawHdfsThroughput: 221.09 MB/sec
- RemoteScanRanges: 0 (0)
- RowsRead: 708.79M (708792960)
- RowsReturned: 0 (0)
- RowsReturnedRate: 0
- ScanRangesComplete: 361 (361)
- ScannerThreadsInvoluntaryContextSwitches: 1.75M (1753580)
- ScannerThreadsTotalWallClockTime: 7h9m
- MaterializeTupleTime(*): 6h43m
- ScannerThreadsSysTime: 12s457ms
- ScannerThreadsUserTime: 3h16m
- ScannerThreadsVoluntaryContextSwitches: 173.90K (173895)
- TotalRawHdfsReadTime(*): 4m19s
- TotalReadThroughput: 145.81 MB/sec
BlockMgr:
- BlockWritesOutstanding: 0 (0)
- BlocksCreated: 0 (0)
- BlocksRecycled: 0 (0)
- BufferedPins: 0 (0)
- BytesWritten: 0
- MaxBlockSize: 8.00 MB (8388608)
- MemoryLimit: 1145324714.67 GB (1229783048198466218)
- PeakMemoryUsage: 0
- TotalBufferWaitTime: 0.000ns
- TotalEncryptionTime: 0.000ns
- TotalIntegrityCheckTime: 0.000ns
- TotalReadBlockTime: 0.000ns
Fragment F00:
Instance f4157aa989f6fe7:6dfb77323ae25d9b (host=hd03:22000):(Total: 6m59s, non-child: 0.000ns, % non-child: 0.00%)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 7:33/4.93 GB 9:31/3.70 GB 6:27/3.70 GB 10:29/4.56 GB 0:44/6.72 GB 11:30/4.67 GB 8:33/5.61 GB 5:30/4.70 GB 3:34/5.49 GB 1:27/3.71 GB 4:33/5.10 GB 2:24/3.10 GB
MemoryUsage(8s000ms): 7.18 GB, 10.65 GB, 11.07 GB, 11.45 GB, 11.63 GB, 11.63 GB, 11.38 GB, 11.58 GB, 11.07 GB, 11.29 GB, 11.66 GB, 11.69 GB, 11.55 GB, 11.57 GB, 11.47 GB, 11.42 GB, 11.36 GB, 11.34 GB, 11.69 GB, 12.23 GB, 12.54 GB, 12.53 GB, 12.33 GB, 12.10 GB, 12.02 GB, 12.01 GB, 12.06 GB, 12.36 GB, 12.14 GB, 12.19 GB, 11.86 GB, 11.59 GB, 11.24 GB, 11.02 GB, 10.77 GB, 11.29 GB, 11.50 GB, 11.63 GB, 12.26 GB, 12.51 GB, 12.43 GB, 12.54 GB, 12.56 GB, 12.43 GB, 12.01 GB, 11.43 GB, 11.28 GB, 11.34 GB, 10.11 GB, 8.49 GB, 6.40 GB, 3.45 GB, 1.03 GB
ThreadUsage(8s000ms): 45, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 47, 42, 34, 26, 13, 5
- AverageThreadTokens: 45.72
- BloomFilterBytes: 0
- PeakMemoryUsage: 12.79 GB (13733103160)
- PerHostPeakMemUsage: 12.79 GB (13733111352)
- PrepareTime: 194.554ms
- RowsProduced: 0 (0)
- TotalCpuTime: 5h16m
- TotalNetworkReceiveTime: 0.000ns
- TotalNetworkSendTime: 236.000ns
- TotalStorageWaitTime: 7m8s
CodeGen:(Total: 422.045ms, non-child: 422.045ms, % non-child: 100.00%)
- CodegenTime: 0.000ns
- CompileTime: 1.593ms
- LoadTime: 0.000ns
- ModuleBitcodeSize: 1.85 MB (1940220)
- OptimizationTime: 229.287ms
- PrepareTime: 189.731ms
DataStreamSender (dst_id=1):(Total: 15.240us, non-child: 15.240us, % non-child: 100.00%)
- BytesSent: 0
- NetworkThroughput(*): 0.00 /sec
- OverallThroughput: 0.00 /sec
- PeakMemoryUsage: 88.00 B (88)
- SerializeBatchTime: 0.000ns
- ThriftTransmitTime(*): 0.000ns
- UncompressedRowBatchSize: 0
HDFS_SCAN_NODE (id=0):(Total: 6m59s, non-child: 6m59s, % non-child: 100.00%)
ExecOption: Expr Evaluation Codegen Enabled, Codegen enabled: 0 out of 375
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 7:33/4.93 GB 9:31/3.70 GB 6:27/3.70 GB 10:29/4.56 GB 0:44/6.72 GB 11:30/4.67 GB 8:33/5.61 GB 5:30/4.70 GB 3:34/5.49 GB 1:27/3.71 GB 4:33/5.10 GB 2:24/3.10 GB
Hdfs Read Thread Concurrency Bucket: 0:59.15% 1:29.28% 2:9.091% 3:1.181% 4:0.4723% 5:0.2361% 6:0.1181% 7:0.3542% 8:0.1181% 9:0% 10:0% 11:0% 12:0% 13:0% 14:0% 15:0%
File Formats: PARQUET/SNAPPY:77250
BytesRead(8s000ms): 5.02 GB, 7.56 GB, 8.08 GB, 8.64 GB, 9.45 GB, 10.23 GB, 11.06 GB, 12.66 GB, 13.88 GB, 15.79 GB, 17.06 GB, 17.81 GB, 18.40 GB, 19.21 GB, 20.22 GB, 21.35 GB, 22.75 GB, 24.13 GB, 25.84 GB, 27.02 GB, 28.18 GB, 28.95 GB, 29.51 GB, 30.18 GB, 31.28 GB, 32.52 GB, 33.89 GB, 35.13 GB, 36.29 GB, 37.55 GB, 38.13 GB, 38.67 GB, 39.23 GB, 40.22 GB, 41.26 GB, 43.25 GB, 44.60 GB, 45.95 GB, 47.55 GB, 48.29 GB, 48.82 GB, 49.50 GB, 50.65 GB, 51.38 GB, 51.90 GB, 52.78 GB, 54.37 GB, 55.55 GB, 56.02 GB, 56.02 GB, 56.02 GB, 56.02 GB, 56.02 GB
- AverageHdfsReadThreadConcurrency: 0.58
- AverageScannerThreadConcurrency: 44.77
- BytesRead: 56.02 GB (60146067269)
- BytesReadDataNodeCache: 0
- BytesReadLocal: 56.02 GB (60146067269)
- BytesReadRemoteUnexpected: 0
- BytesReadShortCircuit: 56.02 GB (60146067269)
- DecompressionTime: 7m37s
- MaxCompressedTextFileLength: 0
- NumColumns: 206 (206)
- NumDisksAccessed: 12 (12)
- NumRowGroups: 375 (375)
- NumScannerThreadsStarted: 47 (47)
- PeakMemoryUsage: 12.79 GB (13733094880)
- PerReadThreadRawHdfsThroughput: 240.70 MB/sec
- RemoteScanRanges: 0 (0)
- RowsRead: 709.39M (709394344)
- RowsReturned: 0 (0)
- RowsReturnedRate: 0
- ScanRangesComplete: 375 (375)
- ScannerThreadsInvoluntaryContextSwitches: 1.61M (1607562)
- ScannerThreadsTotalWallClockTime: 5h16m
- MaterializeTupleTime(*): 5h
- ScannerThreadsSysTime: 12s272ms
- ScannerThreadsUserTime: 3h30m
- ScannerThreadsVoluntaryContextSwitches: 161.30K (161303)
- TotalRawHdfsReadTime(*): 3m58s
- TotalReadThroughput: 135.28 MB/sec
Instance f4157aa989f6fe7:6dfb77323ae25d9e (host=hd02:22000):(Total: 6m35s, non-child: 0.000ns, % non-child: 0.00%)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 5:36/5.29 GB 4:27/4.66 GB 2:30/4.97 GB 11:28/4.61 GB 3:24/3.41 GB 8:35/5.59 GB 9:28/3.97 GB 1:38/5.96 GB 7:34/5.68 GB 6:27/3.92 GB 10:28/4.33 GB 0:23/3.64 GB
MemoryUsage(8s000ms): 6.80 GB, 11.23 GB, 11.61 GB, 12.13 GB, 12.26 GB, 12.33 GB, 12.14 GB, 12.12 GB, 11.58 GB, 11.65 GB, 11.71 GB, 11.42 GB, 11.11 GB, 11.06 GB, 11.00 GB, 11.36 GB, 11.77 GB, 11.77 GB, 11.69 GB, 12.10 GB, 12.06 GB, 11.71 GB, 11.74 GB, 12.10 GB, 12.10 GB, 11.50 GB, 11.49 GB, 11.76 GB, 11.59 GB, 11.79 GB, 11.89 GB, 11.96 GB, 12.34 GB, 12.22 GB, 12.23 GB, 12.18 GB, 12.40 GB, 12.25 GB, 12.32 GB, 12.62 GB, 12.33 GB, 12.25 GB, 12.22 GB, 12.14 GB, 11.63 GB, 11.31 GB, 10.47 GB, 7.54 GB, 3.85 GB, 417.47 MB
ThreadUsage(8s000ms): 45, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 47, 44, 31, 17, 2
- AverageThreadTokens: 45.63
- BloomFilterBytes: 0
- PeakMemoryUsage: 12.73 GB (13664226720)
- PerHostPeakMemUsage: 12.73 GB (13664226720)
- PrepareTime: 192.939ms
- RowsProduced: 0 (0)
- TotalCpuTime: 5h
- TotalNetworkReceiveTime: 0.000ns
- TotalNetworkSendTime: 285.000ns
- TotalStorageWaitTime: 7m51s
BlockMgr:
- BlockWritesOutstanding: 0 (0)
- BlocksCreated: 0 (0)
- BlocksRecycled: 0 (0)
- BufferedPins: 0 (0)
- BytesWritten: 0
- MaxBlockSize: 8.00 MB (8388608)
- MemoryLimit: 6871947776.00 GB (7378697739434983424)
- PeakMemoryUsage: 0
- TotalBufferWaitTime: 0.000ns
- TotalEncryptionTime: 0.000ns
- TotalIntegrityCheckTime: 0.000ns
- TotalReadBlockTime: 0.000ns
CodeGen:(Total: 417.663ms, non-child: 417.663ms, % non-child: 100.00%)
- CodegenTime: 0.000ns
- CompileTime: 1.751ms
- LoadTime: 0.000ns
- ModuleBitcodeSize: 1.85 MB (1940220)
- OptimizationTime: 226.082ms
- PrepareTime: 188.376ms
DataStreamSender (dst_id=1):(Total: 18.442us, non-child: 18.442us, % non-child: 100.00%)
- BytesSent: 0
- NetworkThroughput(*): 0.00 /sec
- OverallThroughput: 0.00 /sec
- PeakMemoryUsage: 88.00 B (88)
- SerializeBatchTime: 0.000ns
- ThriftTransmitTime(*): 0.000ns
- UncompressedRowBatchSize: 0
HDFS_SCAN_NODE (id=0):(Total: 6m35s, non-child: 6m35s, % non-child: 100.00%)
ExecOption: Expr Evaluation Codegen Enabled, Codegen enabled: 0 out of 358
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 5:36/5.29 GB 4:27/4.66 GB 2:30/4.97 GB 11:28/4.61 GB 3:24/3.41 GB 8:35/5.59 GB 9:28/3.97 GB 1:38/5.96 GB 7:34/5.68 GB 6:27/3.92 GB 10:28/4.33 GB 0:23/3.64 GB
Hdfs Read Thread Concurrency Bucket: 0:54.59% 1:33.13% 2:8.437% 3:2.481% 4:0.4963% 5:0% 6:0.1241% 7:0.1241% 8:0.2481% 9:0.2481% 10:0.1241% 11:0% 12:0% 13:0% 14:0% 15:0%
File Formats: PARQUET/SNAPPY:73748
BytesRead(8s000ms): 5.31 GB, 8.07 GB, 8.54 GB, 9.34 GB, 9.89 GB, 10.56 GB, 11.59 GB, 12.72 GB, 14.71 GB, 16.58 GB, 17.49 GB, 17.89 GB, 18.73 GB, 19.76 GB, 21.45 GB, 22.92 GB, 24.57 GB, 25.76 GB, 26.91 GB, 28.00 GB, 28.62 GB, 29.46 GB, 30.88 GB, 32.48 GB, 33.75 GB, 34.67 GB, 35.80 GB, 36.83 GB, 37.70 GB, 38.89 GB, 40.09 GB, 41.70 GB, 43.09 GB, 44.09 GB, 44.89 GB, 46.07 GB, 47.50 GB, 47.86 GB, 49.10 GB, 50.54 GB, 51.35 GB, 52.87 GB, 53.94 GB, 54.91 GB, 55.28 GB, 55.85 GB, 56.07 GB, 56.07 GB, 56.07 GB, 56.07 GB
- AverageHdfsReadThreadConcurrency: 0.67
- AverageScannerThreadConcurrency: 44.68
- BytesRead: 56.07 GB (60206246224)
- BytesReadDataNodeCache: 0
- BytesReadLocal: 56.07 GB (60206246224)
- BytesReadRemoteUnexpected: 0
- BytesReadShortCircuit: 56.07 GB (60206246224)
- DecompressionTime: 8m37s
- MaxCompressedTextFileLength: 0
- NumColumns: 206 (206)
- NumDisksAccessed: 12 (12)
- NumRowGroups: 358 (358)
- NumScannerThreadsStarted: 47 (47)
- PeakMemoryUsage: 12.73 GB (13664149264)
- PerReadThreadRawHdfsThroughput: 212.80 MB/sec
- RemoteScanRanges: 0 (0)
- RowsRead: 710.42M (710422365)
- RowsReturned: 0 (0)
- RowsReturnedRate: 0
- ScanRangesComplete: 358 (358)
- ScannerThreadsInvoluntaryContextSwitches: 1.54M (1539856)
- ScannerThreadsTotalWallClockTime: 5h
- MaterializeTupleTime(*): 4h42m
- ScannerThreadsSysTime: 11s436ms
- ScannerThreadsUserTime: 3h18m
- ScannerThreadsVoluntaryContextSwitches: 163.86K (163861)
- TotalRawHdfsReadTime(*): 4m29s
- TotalReadThroughput: 142.47 MB/sec
Instance f4157aa989f6fe7:6dfb77323ae25d9c (host=hd05:22000):(Total: 6m9s, non-child: 0.000ns, % non-child: 0.00%)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 4:22/3.36 GB 8:29/5.00 GB 11:26/4.44 GB 6:27/3.87 GB 9:26/4.30 GB 3:35/5.84 GB 2:26/4.44 GB 7:23/3.63 GB 1:31/4.93 GB 5:30/5.07 GB 10:33/5.03 GB 0:39/6.12 GB
MemoryUsage(8s000ms): 7.94 GB, 18.73 GB, 22.92 GB, 23.62 GB, 23.90 GB, 24.04 GB, 24.57 GB, 24.78 GB, 24.79 GB, 24.60 GB, 24.89 GB, 24.52 GB, 24.20 GB, 24.17 GB, 24.06 GB, 23.98 GB, 23.71 GB, 23.17 GB, 23.62 GB, 23.98 GB, 24.51 GB, 25.02 GB, 24.90 GB, 24.69 GB, 24.69 GB, 25.05 GB, 25.04 GB, 24.74 GB, 24.85 GB, 24.95 GB, 24.66 GB, 24.57 GB, 24.47 GB, 24.38 GB, 24.12 GB, 24.49 GB, 24.41 GB, 24.08 GB, 24.10 GB, 22.88 GB, 20.92 GB, 19.25 GB, 17.20 GB, 15.48 GB, 12.23 GB, 9.18 GB, 3.59 GB
ThreadUsage(8s000ms): 90, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 94, 88, 80, 74, 66, 59, 47, 36, 15
- AverageThreadTokens: 89.13
- BloomFilterBytes: 0
- PeakMemoryUsage: 25.51 GB (27387815048)
- PerHostPeakMemUsage: 25.51 GB (27387815048)
- PrepareTime: 198.061ms
- RowsProduced: 0 (0)
- TotalCpuTime: 9h14m
- TotalNetworkReceiveTime: 0.000ns
- TotalNetworkSendTime: 172.000ns
- TotalStorageWaitTime: 15m25s
BlockMgr:
- BlockWritesOutstanding: 0 (0)
- BlocksCreated: 0 (0)
- BlocksRecycled: 0 (0)
- BufferedPins: 0 (0)
- BytesWritten: 0
- MaxBlockSize: 8.00 MB (8388608)
- MemoryLimit: 6871947776.00 GB (7378697739434983424)
- PeakMemoryUsage: 0
- TotalBufferWaitTime: 0.000ns
- TotalEncryptionTime: 0.000ns
- TotalIntegrityCheckTime: 0.000ns
- TotalReadBlockTime: 0.000ns
CodeGen:(Total: 424.743ms, non-child: 424.743ms, % non-child: 100.00%)
- CodegenTime: 0.000ns
- CompileTime: 1.547ms
- LoadTime: 0.000ns
- ModuleBitcodeSize: 1.85 MB (1940220)
- OptimizationTime: 227.956ms
- PrepareTime: 193.826ms
DataStreamSender (dst_id=1):(Total: 17.690us, non-child: 17.690us, % non-child: 100.00%)
- BytesSent: 0
- NetworkThroughput(*): 0.00 /sec
- OverallThroughput: 0.00 /sec
- PeakMemoryUsage: 88.00 B (88)
- SerializeBatchTime: 0.000ns
- ThriftTransmitTime(*): 0.000ns
- UncompressedRowBatchSize: 0
HDFS_SCAN_NODE (id=0):(Total: 6m9s, non-child: 6m9s, % non-child: 100.00%)
ExecOption: Expr Evaluation Codegen Enabled, Codegen enabled: 0 out of 347
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 4:22/3.36 GB 8:29/5.00 GB 11:26/4.44 GB 6:27/3.87 GB 9:26/4.30 GB 3:35/5.84 GB 2:26/4.44 GB 7:23/3.63 GB 1:31/4.93 GB 5:30/5.07 GB 10:33/5.03 GB 0:39/6.12 GB
Hdfs Read Thread Concurrency Bucket: 0:61.01% 1:21.62% 2:10.34% 3:2.918% 4:1.326% 5:1.194% 6:0.3979% 7:0.6631% 8:0.1326% 9:0.1326% 10:0% 11:0.1326% 12:0.1326% 13:0% 14:0% 15:0%
File Formats: PARQUET/SNAPPY:71482
BytesRead(8s000ms): 6.18 GB, 13.58 GB, 16.40 GB, 17.04 GB, 17.36 GB, 17.70 GB, 18.54 GB, 19.05 GB, 19.50 GB, 20.33 GB, 21.85 GB, 22.45 GB, 23.66 GB, 25.01 GB, 25.81 GB, 27.49 GB, 28.94 GB, 31.95 GB, 34.06 GB, 35.09 GB, 36.21 GB, 37.39 GB, 37.71 GB, 38.40 GB, 39.41 GB, 40.82 GB, 41.73 GB, 42.82 GB, 44.23 GB, 45.19 GB, 45.98 GB, 46.99 GB, 48.22 GB, 49.54 GB, 51.46 GB, 53.18 GB, 53.96 GB, 54.78 GB, 55.94 GB, 56.05 GB, 56.05 GB, 56.05 GB, 56.05 GB, 56.05 GB, 56.05 GB, 56.05 GB, 56.05 GB
- AverageHdfsReadThreadConcurrency: 0.75
- AverageScannerThreadConcurrency: 88.25
- BytesRead: 56.05 GB (60186033047)
- BytesReadDataNodeCache: 0
- BytesReadLocal: 56.05 GB (60186033047)
- BytesReadRemoteUnexpected: 0
- BytesReadShortCircuit: 56.05 GB (60186033047)
- DecompressionTime: 17m9s
- MaxCompressedTextFileLength: 0
- NumColumns: 206 (206)
- NumDisksAccessed: 12 (12)
- NumRowGroups: 347 (347)
- NumScannerThreadsStarted: 95 (95)
- PeakMemoryUsage: 25.51 GB (27387749416)
- PerReadThreadRawHdfsThroughput: 210.49 MB/sec
- RemoteScanRanges: 0 (0)
- RowsRead: 709.64M (709642407)
- RowsReturned: 0 (0)
- RowsReturnedRate: 0
- ScanRangesComplete: 347 (347)
- ScannerThreadsInvoluntaryContextSwitches: 1.92M (1921121)
- ScannerThreadsTotalWallClockTime: 9h14m
- MaterializeTupleTime(*): 8h40m
- ScannerThreadsSysTime: 12s460ms
- ScannerThreadsUserTime: 3h10m
- ScannerThreadsVoluntaryContextSwitches: 181.30K (181300)
- TotalRawHdfsReadTime(*): 4m32s
- TotalReadThroughput: 152.25 MB/sec
Instance f4157aa989f6fe7:6dfb77323ae25d9d (host=hd04:22000):(Total: 6m3s, non-child: 0.000ns, % non-child: 0.00%)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 1:33/5.45 GB 5:32/5.33 GB 4:32/4.63 GB 3:31/5.40 GB 10:31/5.09 GB 11:36/5.28 GB 6:30/4.15 GB 0:32/4.37 GB 9:40/5.97 GB 8:31/4.73 GB 2:36/5.33 GB
MemoryUsage(8s000ms): 7.68 GB, 18.07 GB, 22.41 GB, 23.38 GB, 24.03 GB, 24.28 GB, 24.47 GB, 24.79 GB, 24.82 GB, 24.70 GB, 24.75 GB, 24.74 GB, 24.09 GB, 24.09 GB, 23.83 GB, 23.47 GB, 23.78 GB, 24.22 GB, 24.16 GB, 24.06 GB, 24.08 GB, 24.50 GB, 24.65 GB, 24.80 GB, 24.71 GB, 24.80 GB, 24.76 GB, 24.65 GB, 24.36 GB, 24.34 GB, 24.02 GB, 24.17 GB, 23.77 GB, 24.22 GB, 24.57 GB, 24.28 GB, 24.43 GB, 24.39 GB, 23.96 GB, 22.36 GB, 19.68 GB, 17.92 GB, 15.37 GB, 12.84 GB, 10.41 GB, 5.30 GB
ThreadUsage(8s000ms): 90, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 95, 87, 75, 69, 59, 49, 41, 21
- AverageThreadTokens: 89.03
- BloomFilterBytes: 0
- PeakMemoryUsage: 25.00 GB (26839882712)
- PerHostPeakMemUsage: 25.00 GB (26839882712)
- PrepareTime: 197.797ms
- RowsProduced: 0 (0)
- TotalCpuTime: 9h6m
- TotalNetworkReceiveTime: 0.000ns
- TotalNetworkSendTime: 244.000ns
- TotalStorageWaitTime: 16m37s
BlockMgr:
- BlockWritesOutstanding: 0 (0)
- BlocksCreated: 0 (0)
- BlocksRecycled: 0 (0)
- BufferedPins: 0 (0)
- BytesWritten: 0
- MaxBlockSize: 8.00 MB (8388608)
- MemoryLimit: 6871947776.00 GB (7378697739434983424)
- PeakMemoryUsage: 0
- TotalBufferWaitTime: 0.000ns
- TotalEncryptionTime: 0.000ns
- TotalIntegrityCheckTime: 0.000ns
- TotalReadBlockTime: 0.000ns
CodeGen:(Total: 421.294ms, non-child: 421.294ms, % non-child: 100.00%)
- CodegenTime: 0.000ns
- CompileTime: 1.435ms
- LoadTime: 0.000ns
- ModuleBitcodeSize: 1.85 MB (1940220)
- OptimizationTime: 225.189ms
- PrepareTime: 193.277ms
DataStreamSender (dst_id=1):(Total: 16.626us, non-child: 16.626us, % non-child: 100.00%)
- BytesSent: 0
- NetworkThroughput(*): 0.00 /sec
- OverallThroughput: 0.00 /sec
- PeakMemoryUsage: 88.00 B (88)
- SerializeBatchTime: 0.000ns
- ThriftTransmitTime(*): 0.000ns
- UncompressedRowBatchSize: 0
HDFS_SCAN_NODE (id=0):(Total: 6m3s, non-child: 6m3s, % non-child: 100.00%)
ExecOption: Expr Evaluation Codegen Enabled, Codegen enabled: 0 out of 364
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 1:33/5.45 GB 5:32/5.33 GB 4:32/4.63 GB 3:31/5.40 GB 10:31/5.09 GB 11:36/5.28 GB 6:30/4.15 GB 0:32/4.37 GB 9:40/5.97 GB 8:31/4.73 GB 2:36/5.33 GB
Hdfs Read Thread Concurrency Bucket: 0:61.16% 1:22.85% 2:9.812% 3:3.36% 4:0.8065% 5:0.5376% 6:0.4032% 7:0.5376% 8:0.2688% 9:0.2688% 10:0% 11:0% 12:0% 13:0% 14:0% 15:0%
File Formats: PARQUET/SNAPPY:74984
BytesRead(8s000ms): 5.38 GB, 12.80 GB, 15.92 GB, 16.87 GB, 17.50 GB, 17.90 GB, 18.19 GB, 19.07 GB, 19.85 GB, 20.73 GB, 21.36 GB, 22.27 GB, 23.00 GB, 24.30 GB, 25.28 GB, 26.95 GB, 30.28 GB, 32.90 GB, 34.25 GB, 34.39 GB, 35.41 GB, 36.99 GB, 37.85 GB, 39.04 GB, 39.69 GB, 40.81 GB, 41.74 GB, 42.40 GB, 43.74 GB, 45.00 GB, 45.83 GB, 47.11 GB, 47.89 GB, 50.54 GB, 52.39 GB, 53.31 GB, 54.31 GB, 55.06 GB, 55.39 GB, 55.75 GB, 55.75 GB, 55.75 GB, 55.75 GB, 55.75 GB, 55.75 GB, 55.75 GB
- AverageHdfsReadThreadConcurrency: 0.69
- AverageScannerThreadConcurrency: 88.15
- BytesRead: 55.75 GB (59859930366)
- BytesReadDataNodeCache: 0
- BytesReadLocal: 55.75 GB (59859930366)
- BytesReadRemoteUnexpected: 0
- BytesReadShortCircuit: 55.75 GB (59859930366)
- DecompressionTime: 16m57s
- MaxCompressedTextFileLength: 0
- NumColumns: 206 (206)
- NumDisksAccessed: 11 (11)
- NumRowGroups: 364 (364)
- NumScannerThreadsStarted: 95 (95)
- PeakMemoryUsage: 25.00 GB (26839874432)
- PerReadThreadRawHdfsThroughput: 220.37 MB/sec
- RemoteScanRanges: 0 (0)
- RowsRead: 705.71M (705712727)
- RowsReturned: 0 (0)
- RowsReturnedRate: 0
- ScanRangesComplete: 364 (364)
- ScannerThreadsInvoluntaryContextSwitches: 1.95M (1945782)
- ScannerThreadsTotalWallClockTime: 9h6m
- MaterializeTupleTime(*): 8h31m
- ScannerThreadsSysTime: 13s660ms
- ScannerThreadsUserTime: 3h6m
- ScannerThreadsVoluntaryContextSwitches: 189.12K (189117)
- TotalRawHdfsReadTime(*): 4m19s
- TotalReadThroughput: 153.25 MB/sec

 

Could somebody help me to understand, why there is a such big difference in time response, while my both query retrieve no data ?

 

1 ACCEPTED SOLUTION

avatar

Impala's Parquet scaner materializes all columns required by the query in memory, and then evaluates predicates to discard non-matching rows. If you have many columns, then this materialization is expensive.

 

This issue is described in https://issues.cloudera.org/browse/IMPALA-2017 and is active area of work to improve performance.

 

Alex

View solution in original post

4 REPLIES 4

avatar

Hi!

 

the reason for the speed difference between your two queries is not how many rows are returned, but how many columns are accessed by the query. Since Parquet is a columnar format, the number of accessed columns makes a huge difference to performance. Your first query does "select *", but your second query only accesses "colA" and "colB".

 

I suspect that when you change your queries to access the same number of columns, then you won't see a big speed difference the two.

 

Alex

avatar
New Contributor

Hello,

 

I knew that a query is slower if there is lot of cols to retrieve. It's due to parquet storage.

 

What I don't understand, is why the "select *" query is slow while I have no results to retrieve (no row). What data Impala is parsing, because there no results ?

avatar

Impala's Parquet scaner materializes all columns required by the query in memory, and then evaluates predicates to discard non-matching rows. If you have many columns, then this materialization is expensive.

 

This issue is described in https://issues.cloudera.org/browse/IMPALA-2017 and is active area of work to improve performance.

 

Alex

avatar
New Contributor

Great thanks for your answer.