Member since
04-02-2017
10
Posts
0
Kudos Received
0
Solutions
04-12-2017
09:00 PM
Column a.t_date is a string field, not a timestamp field. The two tables are Parquet file format. By adding more nodes into the cluster, more Impala Daemon are running, can we aspect the performance for such query will be improve?
... View more
04-12-2017
01:34 AM
@EricL Ya, it make big difference. The results started to come out in just a few minutes.
... View more
04-11-2017
12:25 AM
@EricL Any idea based on the information i posted in the previous post? Appreciate if you could provide me little input. Thanks a lot.
... View more
04-04-2017
01:54 AM
Hi EricL, I had performed the query and grab some information from Impalad Web UI. Some how the query will stop at a point and it will never returned the results. Referring to the below picture from Impalad Web UI, the query will stop at the below count for long long time. But if I performed the same query using count instead of select statement as the following, the result will return within 2 minutes++. select count(*) from default.sales a inner join default.product b on a.product_id = b.product_id The picture below was taken from Impalad Web UI for the above query: Below is the explain statement that I tried to issue before running the select query: [lnx63.test.com:21000] > explain select a.t_date, b.p_date from default.sales a inner join default.product b on a.product_id = b.product_id order by a.t_date desc;
Query: explain select a.t_date, b.p_date from default.sales a inner join default.product b on a.product_id = b.product_id order by a.t_date desc
+-----------------------------------------------------------+
| Explain String |
+-----------------------------------------------------------+
| Estimated Per-Host Requirements: Memory=992.03MB VCores=2 |
| |
| PLAN-ROOT SINK |
| | |
| 05:MERGING-EXCHANGE [UNPARTITIONED] |
| | order by: a.t_date DESC |
| | |
| 03:SORT |
| | order by: a.t_date DESC |
| | |
| 02:HASH JOIN [INNER JOIN, BROADCAST] |
| | hash predicates: a.product_id = b.product_id |
| | runtime filters: RF000 <- b.product_id |
| | |
| |--04:EXCHANGE [BROADCAST] |
| | | |
| | 01:SCAN HDFS [default.product b] |
| | partitions=1/1 files=4629 size=23.33MB |
| | |
| 00:SCAN HDFS [default.sales a] |
| partitions=1/1 files=4 size=1.46GB |
| runtime filters: RF000 -> a.product_id |
+-----------------------------------------------------------+
Fetched 22 row(s) in 0.03s Below is the Text Plan that I grab from Impalad Web UI for the select statement: Estimated Per-Host Requirements: Memory=992.03MB VCores=2
PLAN-ROOT SINK
|
05:MERGING-EXCHANGE [UNPARTITIONED]
| order by: wnh_date DESC
| hosts=1 per-host-mem=unavailable
| tuple-ids=2 row-size=66B cardinality=635583007
|
03:SORT
| order by: wnh_date DESC
| hosts=1 per-host-mem=800.00MB
| tuple-ids=2 row-size=66B cardinality=635583007
|
02:HASH JOIN [INNER JOIN, BROADCAST]
| hash predicates: a.lot_number = b.wnh_lot_name
| runtime filters: RF000 <- b.wnh_lot_name
| hosts=1 per-host-mem=16.03MB
| tuple-ids=0,1 row-size=115B cardinality=635583007
|
|--04:EXCHANGE [BROADCAST]
| | hosts=1 per-host-mem=0B
| | tuple-ids=1 row-size=62B cardinality=247199
| |
| 01:SCAN HDFS [emos.w_nmf_headers b, RANDOM]
| partitions=1/1 files=4629 size=23.33MB
| table stats: 247199 rows total
| column stats: all
| hosts=1 per-host-mem=32.00MB
| tuple-ids=1 row-size=62B cardinality=247199
|
00:SCAN HDFS [mc300.test_measurement a, RANDOM]
partitions=1/1 files=4 size=1.46GB
runtime filters: RF000 -> a.lot_number
table stats: 635583007 rows total
column stats: all
hosts=1 per-host-mem=176.00MB
tuple-ids=0 row-size=53B cardinality=635583007 Below is the Profile that I grab from Impalad Web UI for the select statement: Query (id=6e4732e641a91047:abdeb0b200000000):
Summary:
Session ID: 9845648296bcdcad:d3654b3b7ad815a6
Session Type: BEESWAX
Start Time: 2017-04-04 15:46:26.954248000
End Time:
Query Type: QUERY
Query State: CREATED
Query Status: OK
Impala Version: impalad version 2.7.0-cdh5.10.0 RELEASE (build 785a073cd07e2540d521ecebb8b38161ccbd2aa2)
User: adm-bigdata
Connected User: adm-bigdata
Delegated User:
Network Address: 172.16.11.11:39644
Default Db: default
Sql Statement: select a.t_date, b.p_date from default.sales a inner join default.product b on a.product_id = b.product_id order by a.t_date desc
Coordinator: lnx63.test.com:22000
Query Options (non default): MEM_LIMIT=85899345920
Plan:
----------------
Estimated Per-Host Requirements: Memory=992.03MB VCores=2
PLAN-ROOT SINK
|
05:MERGING-EXCHANGE [UNPARTITIONED]
| order by: t_date DESC
| hosts=1 per-host-mem=unavailable
| tuple-ids=2 row-size=66B cardinality=635583007
|
03:SORT
| order by: t_date DESC
| hosts=1 per-host-mem=800.00MB
| tuple-ids=2 row-size=66B cardinality=635583007
|
02:HASH JOIN [INNER JOIN, BROADCAST]
| hash predicates: a.product_id = b.product_id
| runtime filters: RF000 <- b.product_id
| hosts=1 per-host-mem=16.03MB
| tuple-ids=0,1 row-size=115B cardinality=635583007
|
|--04:EXCHANGE [BROADCAST]
| | hosts=1 per-host-mem=0B
| | tuple-ids=1 row-size=62B cardinality=247199
| |
| 01:SCAN HDFS [default.product b, RANDOM]
| partitions=1/1 files=4629 size=23.33MB
| table stats: 247199 rows total
| column stats: all
| hosts=1 per-host-mem=32.00MB
| tuple-ids=1 row-size=62B cardinality=247199
|
00:SCAN HDFS [default.sales a, RANDOM]
partitions=1/1 files=4 size=1.46GB
runtime filters: RF000 -> a.product_id
table stats: 635583007 rows total
column stats: all
hosts=1 per-host-mem=176.00MB
tuple-ids=0 row-size=53B cardinality=635583007
----------------
Estimated Per-Host Mem: 1040223818
Estimated Per-Host VCores: 2
Request Pool: root.adm-bigdata
Admission result: Admitted immediately
Planner Timeline: 5.990ms
- Analysis finished: 1.702ms (1.702ms)
- Equivalence classes computed: 1.860ms (158.279us)
- Single node plan created: 3.637ms (1.776ms)
- Runtime filters computed: 3.791ms (153.342us)
- Distributed plan created: 3.941ms (150.032us)
- Lineage info computed: 4.094ms (153.627us)
- Planning finished: 5.990ms (1.896ms)
Query Timeline: 2m2s
- Query submitted: 58.209us (58.209us)
- Planning finished: 16.575ms (16.516ms)
- Submit for admission: 24.233ms (7.658ms)
- Completed admission: 24.326ms (92.557us)
- Ready to start 3 fragment instances: 25.347ms (1.021ms)
- All 3 fragment instances started: 34.168ms (8.821ms)
- ComputeScanRangeAssignmentTimer: 4.926ms
ImpalaServer:
- ClientFetchWaitTimer: 0.000ns
- RowMaterializationTimer: 0.000ns
Execution Profile 6e4732e641a91047:abdeb0b200000000:(Total: 21.253ms, non-child: 0.000ns, % non-child: 0.00%)
Number of filters: 1
Filter routing table:
ID Src. Node Tgt. Node(s) Targets Target type Partition filter Pending (Expected) First arrived Completed Enabled
----------------------------------------------------------------------------------------------------------------------------
0 2 0 1 LOCAL false 0 (1) N/A N/A true
Fragment instance start latencies: Count: 3, 25th %-ile: 0, 50th %-ile: 1ms, 75th %-ile: 1ms, 90th %-ile: 7ms, 95th %-ile: 7ms, 99.9th %-ile: 7ms
- FiltersReceived: 0 (0)
- FinalizationTimer: 0.000ns
Coordinator Fragment F02:
Instance 6e4732e641a91047:abdeb0b200000000 (host=lnx63.test.com:22000):(Total: 20.346ms, non-child: 0.000ns, % non-child: 0.00%)
MemoryUsage(2s000ms): 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB, 8.26 KB
- AverageThreadTokens: 0.00
- BloomFilterBytes: 0
- PeakMemoryUsage: 280.50 KB (287232)
- PerHostPeakMemUsage: 48.02 GB (51555936528)
- RowsProduced: 0 (0)
- TotalNetworkReceiveTime: 0.000ns
- TotalNetworkSendTime: 0.000ns
- TotalStorageWaitTime: 0.000ns
- TotalThreadsInvoluntaryContextSwitches: 0 (0)
- TotalThreadsTotalWallClockTime: 0.000ns
- TotalThreadsSysTime: 0.000ns
- TotalThreadsUserTime: 0.000ns
- TotalThreadsVoluntaryContextSwitches: 0 (0)
Fragment Instance Lifecycle Timings:
- OpenTime: 0.000ns
- ExecTreeOpenTime: 0.000ns
- PrepareTime: 20.326ms
- ExecTreePrepareTime: 29.178us
BlockMgr:
- BlockWritesOutstanding: 0 (0)
- BlocksCreated: 6.21K (6207)
- BlocksRecycled: 0 (0)
- BufferedPins: 0 (0)
- BytesWritten: 0
- MaxBlockSize: 8.00 MB (8388608)
- MemoryLimit: 48.00 GB (51539607552)
- PeakMemoryUsage: 47.99 GB (51531218944)
- ScratchFileUsedBytes: 0
- TotalBufferWaitTime: 0.000ns
- TotalEncryptionTime: 0.000ns
- TotalReadBlockTime: 0.000ns
PLAN_ROOT_SINK:
- PeakMemoryUsage: 0
CodeGen:(Total: 69.554ms, non-child: 69.554ms, % non-child: 100.00%)
- CodegenTime: 967.462us
- CompileTime: 6.179ms
- LoadTime: 0.000ns
- ModuleBitcodeSize: 1.91 MB (1997624)
- NumFunctions: 28 (28)
- NumInstructions: 545 (545)
- OptimizationTime: 42.490ms
- PeakMemoryUsage: 272.50 KB (279040)
- PrepareTime: 20.314ms
EXCHANGE_NODE (id=5):
ExecOption: Codegen Enabled
BytesReceived(2s000ms): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
- BytesReceived: 0
- ConvertRowBatchTime: 0.000ns
- DeserializeRowBatchTimer: 0.000ns
- FirstBatchArrivalWaitTime: 0.000ns
- MergeGetNext: 0.000ns
- MergeGetNextBatch: 0.000ns
- PeakMemoryUsage: 0
- RowsReturned: 0 (0)
- RowsReturnedRate: 0
- SendersBlockedTimer: 0.000ns
- SendersBlockedTotalTimer(*): 0.000ns
Averaged Fragment F00:(Total: 26.292ms, non-child: 0.000ns, % non-child: 0.00%)
split sizes: min: 1.46 GB, max: 1.46 GB, avg: 1.46 GB, stddev: 0
- AverageThreadTokens: 10.72
- BloomFilterBytes: 1.00 MB (1048576)
- PeakMemoryUsage: 48.02 GB (51555928070)
- PerHostPeakMemUsage: 48.02 GB (51555936528)
- RowsProduced: 0 (0)
- TotalNetworkReceiveTime: 7s772ms
- TotalNetworkSendTime: 0.000ns
- TotalStorageWaitTime: 177.383ms
- TotalThreadsInvoluntaryContextSwitches: 80 (80)
- TotalThreadsTotalWallClockTime: 50s836ms
- TotalThreadsSysTime: 1s847ms
- TotalThreadsUserTime: 7s221ms
- TotalThreadsVoluntaryContextSwitches: 5.23K (5228)
Fragment Instance Lifecycle Timings:
- OpenTime: 0.000ns
- ExecTreeOpenTime: 0.000ns
- PrepareTime: 26.275ms
- ExecTreePrepareTime: 251.675us
DataStreamSender (dst_id=5):(Total: 9.562us, non-child: 9.562us, % non-child: 100.00%)
- BytesSent: 0
- NetworkThroughput(*): 0.00 /sec
- OverallThroughput: 0.00 /sec
- PeakMemoryUsage: 3.88 KB (3968)
- RowsReturned: 0 (0)
- SerializeBatchTime: 0.000ns
- TransmitDataRPCTime: 0.000ns
- UncompressedRowBatchSize: 0
CodeGen:(Total: 448.477ms, non-child: 448.477ms, % non-child: 100.00%)
- CodegenTime: 1.985ms
- CompileTime: 132.083ms
- LoadTime: 0.000ns
- ModuleBitcodeSize: 1.91 MB (1997624)
- NumFunctions: 105 (105)
- NumInstructions: 3.39K (3389)
- OptimizationTime: 293.861ms
- PeakMemoryUsage: 1.65 MB (1735168)
- PrepareTime: 22.051ms
SORT_NODE (id=3):(Total: 251.329us, non-child: 0.000ns, % non-child: 0.00%)
- InMemorySortTime: 0.000ns
- InitialRunsCreated: 1 (1)
- PeakMemoryUsage: 47.86 GB (51388662784)
- RowsReturned: 0 (0)
- RowsReturnedRate: 0
- SortDataSize: 0
- SpilledRuns: 0 (0)
- TotalMergesPerformed: 0 (0)
HASH_JOIN_NODE (id=2):(Total: 12s856ms, non-child: 4s859ms, % non-child: 37.80%)
- BuildRows: 247.20K (247199)
- BuildTime: 66.740ms
- PeakMemoryUsage: 162.11 MB (169987328)
- ProbeRows: 15.09M (15089094)
- ProbeRowsPartitioned: 0 (0)
- ProbeTime: 4s623ms
- RowsReturned: 771.02M (771017323)
- RowsReturnedRate: 59.97 M/sec
Hash Join Builder (join_node_id=2):
- BuildRowsPartitionTime: 38.467ms
- BuildRowsPartitioned: 247.20K (247199)
- GetNewBlockTime: 583.596us
- HashBuckets: 524.29K (524288)
- HashCollisions: 0 (0)
- HashTablesBuildTime: 27.465ms
- LargestPartitionPercent: 6 (6)
- MaxPartitionLevel: 0 (0)
- NumRepartitions: 0 (0)
- PartitionsCreated: 16 (16)
- PeakMemoryUsage: 154.03 MB (161510528)
- PinTime: 0.000ns
- RepartitionTime: 0.000ns
- SpilledPartitions: 0 (0)
- UnpinTime: 0.000ns
EXCHANGE_NODE (id=4):(Total: 7s786ms, non-child: 7s786ms, % non-child: 100.00%)
- BytesReceived: 7.33 MB (7682004)
- ConvertRowBatchTime: 8.378ms
- DeserializeRowBatchTimer: 44.472ms
- FirstBatchArrivalWaitTime: 0.000ns
- PeakMemoryUsage: 0
- RowsReturned: 247.20K (247199)
- RowsReturnedRate: 31.75 K/sec
- SendersBlockedTimer: 0.000ns
- SendersBlockedTotalTimer(*): 0.000ns
HDFS_SCAN_NODE (id=0):(Total: 209.997ms, non-child: 209.997ms, % non-child: 100.00%)
- AverageHdfsReadThreadConcurrency: 0.00
- AverageScannerThreadConcurrency: 9.69
- BytesRead: 2.40 MB (2520631)
- BytesReadDataNodeCache: 0
- BytesReadLocal: 0
- BytesReadRemoteUnexpected: 0
- BytesReadShortCircuit: 0
- DecompressionTime: 12.751ms
- MaxCompressedTextFileLength: 0
- NumColumns: 2 (2)
- NumDisksAccessed: 0 (0)
- NumRowGroups: 13 (13)
- NumScannerThreadsStarted: 13 (13)
- PeakMemoryUsage: 185.27 MB (194266667)
- PerReadThreadRawHdfsThroughput: 155.03 MB/sec
- RemoteScanRanges: 0 (0)
- RowBatchQueueGetWaitTime: 0.000ns
- RowBatchQueuePutWaitTime: 0.000ns
- RowsRead: 164.13M (164125046)
- RowsReturned: 15.09M (15089094)
- RowsReturnedRate: 71.85 M/sec
- ScanRangesComplete: 3 (3)
- ScannerThreadsInvoluntaryContextSwitches: 80 (80)
- ScannerThreadsTotalWallClockTime: 42s981ms
- MaterializeTupleTime(*): 8s877ms
- ScannerThreadsSysTime: 1s801ms
- ScannerThreadsUserTime: 7s131ms
- ScannerThreadsVoluntaryContextSwitches: 620 (620)
- TotalRawHdfsReadTime(*): 15.506ms
- TotalReadThroughput: 20.34 KB/sec
Filter 0 (1.00 MB):
- Rows processed: 145.72M (145719670)
- Rows rejected: 145.71M (145713902)
- Rows total: 145.72M (145719670)
Fragment F00:
Instance 6e4732e641a91047:2 (host=lnx63.test.com:22000):(Total: 26.292ms, non-child: 0.000ns, % non-child: 0.00%)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:13/1.46 GB
Filter 0 arrival: 8s282ms
MemoryUsage(2s000ms): 32.18 MB, 34.45 MB, 108.41 MB, 162.46 MB, 625.97 MB, 1.67 GB, 2.75 GB, 3.95 GB, 5.26 GB, 6.50 GB, 7.60 GB, 8.75 GB, 9.86 GB, 11.07 GB, 12.33 GB, 13.54 GB, 14.78 GB, 16.04 GB, 17.26 GB, 18.48 GB, 19.73 GB, 20.96 GB, 22.27 GB, 23.67 GB, 25.07 GB, 26.44 GB, 27.83 GB, 29.09 GB, 30.32 GB, 31.53 GB, 32.74 GB, 33.95 GB, 35.16 GB, 36.37 GB, 37.61 GB, 38.97 GB, 40.30 GB, 41.56 GB, 42.87 GB, 44.11 GB, 45.31 GB, 46.55 GB, 47.70 GB, 48.02 GB, 48.02 GB, 48.02 GB, 48.02 GB, 48.02 GB, 48.02 GB, 48.02 GB, 48.02 GB, 48.02 GB, 48.02 GB, 48.02 GB, 48.02 GB, 48.02 GB, 48.02 GB, 48.02 GB, 48.02 GB, 48.02 GB
ThreadUsage(2s000ms): 1, 2, 2, 2, 11, 14, 14, 14, 14, 14, 13, 13, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11
- AverageThreadTokens: 10.72
- BloomFilterBytes: 1.00 MB (1048576)
- PeakMemoryUsage: 48.02 GB (51555928070)
- PerHostPeakMemUsage: 48.02 GB (51555936528)
- RowsProduced: 0 (0)
- TotalNetworkReceiveTime: 7s772ms
- TotalNetworkSendTime: 0.000ns
- TotalStorageWaitTime: 177.383ms
- TotalThreadsInvoluntaryContextSwitches: 80 (80)
- TotalThreadsTotalWallClockTime: 50s836ms
- TotalThreadsSysTime: 1s847ms
- TotalThreadsUserTime: 7s221ms
- TotalThreadsVoluntaryContextSwitches: 5.23K (5228)
Fragment Instance Lifecycle Timings:
- OpenTime: 0.000ns
- ExecTreeOpenTime: 0.000ns
- PrepareTime: 26.275ms
- ExecTreePrepareTime: 251.675us
DataStreamSender (dst_id=5):(Total: 9.562us, non-child: 9.562us, % non-child: 100.00%)
- BytesSent: 0
- NetworkThroughput(*): 0.00 /sec
- OverallThroughput: 0.00 /sec
- PeakMemoryUsage: 3.88 KB (3968)
- RowsReturned: 0 (0)
- SerializeBatchTime: 0.000ns
- TransmitDataRPCTime: 0.000ns
- UncompressedRowBatchSize: 0
CodeGen:(Total: 448.477ms, non-child: 448.477ms, % non-child: 100.00%)
- CodegenTime: 1.985ms
- CompileTime: 132.083ms
- LoadTime: 0.000ns
- ModuleBitcodeSize: 1.91 MB (1997624)
- NumFunctions: 105 (105)
- NumInstructions: 3.39K (3389)
- OptimizationTime: 293.861ms
- PeakMemoryUsage: 1.65 MB (1735168)
- PrepareTime: 22.051ms
SORT_NODE (id=3):(Total: 251.329us, non-child: 0.000ns, % non-child: 0.00%)
ExecOption: Codegen Enabled
- InMemorySortTime: 0.000ns
- InitialRunsCreated: 1 (1)
- PeakMemoryUsage: 47.86 GB (51388662784)
- RowsReturned: 0 (0)
- RowsReturnedRate: 0
- SortDataSize: 0
- SpilledRuns: 0 (0)
- TotalMergesPerformed: 0 (0)
HASH_JOIN_NODE (id=2):(Total: 12s856ms, non-child: 4s859ms, % non-child: 37.80%)
ExecOption: Probe Side Codegen Enabled, Join Build-Side Prepared Asynchronously
- BuildRows: 247.20K (247199)
- BuildTime: 66.740ms
- PeakMemoryUsage: 162.11 MB (169987328)
- ProbeRows: 15.09M (15089094)
- ProbeRowsPartitioned: 0 (0)
- ProbeTime: 4s623ms
- RowsReturned: 771.02M (771017323)
- RowsReturnedRate: 59.97 M/sec
Hash Join Builder (join_node_id=2):
ExecOption: Build Side Codegen Enabled, Hash Table Construction Codegen Enabled
Runtime filters: 1 of 1 Runtime Filter Published
- BuildRowsPartitionTime: 38.467ms
- BuildRowsPartitioned: 247.20K (247199)
- GetNewBlockTime: 583.596us
- HashBuckets: 524.29K (524288)
- HashCollisions: 0 (0)
- HashTablesBuildTime: 27.465ms
- LargestPartitionPercent: 6 (6)
- MaxPartitionLevel: 0 (0)
- NumRepartitions: 0 (0)
- PartitionsCreated: 16 (16)
- PeakMemoryUsage: 154.03 MB (161510528)
- PinTime: 0.000ns
- RepartitionTime: 0.000ns
- SpilledPartitions: 0 (0)
- UnpinTime: 0.000ns
EXCHANGE_NODE (id=4):(Total: 7s786ms, non-child: 38.988ms, % non-child: 0.50%)
BytesReceived(2s000ms): 869.59 KB, 3.18 MB, 4.67 MB, 6.03 MB, 7.32 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB, 7.33 MB
- BytesReceived: 7.33 MB (7682004)
- ConvertRowBatchTime: 8.378ms
- DeserializeRowBatchTimer: 44.472ms
- FirstBatchArrivalWaitTime: 0.000ns
- PeakMemoryUsage: 0
- RowsReturned: 247.20K (247199)
- RowsReturnedRate: 31.75 K/sec
- SendersBlockedTimer: 0.000ns
- SendersBlockedTotalTimer(*): 0.000ns
HDFS_SCAN_NODE (id=0):(Total: 209.997ms, non-child: 209.997ms, % non-child: 100.00%)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:13/1.46 GB
ExecOption: PARQUET Codegen Enabled
Runtime filters: All filters arrived. Waited 0
BytesRead(2s000ms): 0, 0, 0, 0, 1.80 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB, 2.40 MB
- FooterProcessingTime: (Avg: 8.834ms ; Min: 2.568ms ; Max: 13.906ms ; Number of samples: 13)
- AverageHdfsReadThreadConcurrency: 0.00
- AverageScannerThreadConcurrency: 9.69
- BytesRead: 2.40 MB (2520631)
- BytesReadDataNodeCache: 0
- BytesReadLocal: 0
- BytesReadRemoteUnexpected: 0
- BytesReadShortCircuit: 0
- DecompressionTime: 12.751ms
- MaxCompressedTextFileLength: 0
- NumColumns: 2 (2)
- NumDisksAccessed: 0 (0)
- NumRowGroups: 13 (13)
- NumScannerThreadsStarted: 13 (13)
- PeakMemoryUsage: 185.27 MB (194266667)
- PerReadThreadRawHdfsThroughput: 155.03 MB/sec
- RemoteScanRanges: 0 (0)
- RowBatchQueueGetWaitTime: 0.000ns
- RowBatchQueuePutWaitTime: 0.000ns
- RowsRead: 164.13M (164125046)
- RowsReturned: 15.09M (15089094)
- RowsReturnedRate: 71.85 M/sec
- ScanRangesComplete: 3 (3)
- ScannerThreadsInvoluntaryContextSwitches: 80 (80)
- ScannerThreadsTotalWallClockTime: 42s981ms
- MaterializeTupleTime(*): 8s877ms
- ScannerThreadsSysTime: 1s801ms
- ScannerThreadsUserTime: 7s131ms
- ScannerThreadsVoluntaryContextSwitches: 620 (620)
- TotalRawHdfsReadTime(*): 15.506ms
- TotalReadThroughput: 20.34 KB/sec
Filter 0 (1.00 MB):
- Rows processed: 145.72M (145719670)
- Rows rejected: 145.71M (145713902)
- Rows total: 145.72M (145719670)
Averaged Fragment F01:(Total: 8s274ms, non-child: 0.000ns, % non-child: 0.00%)
split sizes: min: 23.33 MB, max: 23.33 MB, avg: 23.33 MB, stddev: 0
- AverageThreadTokens: 16.00
- BloomFilterBytes: 0
- PeakMemoryUsage: 3.83 MB (4019676)
- PerHostPeakMemUsage: 164.82 MB (172828949)
- RowsProduced: 247.20K (247199)
- TotalNetworkReceiveTime: 0.000ns
- TotalNetworkSendTime: 31.027ms
- TotalStorageWaitTime: 2m2s
- TotalThreadsInvoluntaryContextSwitches: 3 (3)
- TotalThreadsTotalWallClockTime: 2m11s
- TotalThreadsSysTime: 165.338ms
- TotalThreadsUserTime: 1s105ms
- TotalThreadsVoluntaryContextSwitches: 30.04K (30039)
Fragment Instance Lifecycle Timings:
- ExecTime: 8s226ms
- ExecTreeExecTime: 8s068ms
- OpenTime: 14.126ms
- ExecTreeOpenTime: 414.542us
- PrepareTime: 34.061ms
- ExecTreePrepareTime: 13.813ms
DataStreamSender (dst_id=4):(Total: 194.833ms, non-child: 194.833ms, % non-child: 100.00%)
- BytesSent: 7.33 MB (7682004)
- NetworkThroughput(*): 62.93 MB/sec
- OverallThroughput: 37.60 MB/sec
- PeakMemoryUsage: 3.88 KB (3968)
- RowsReturned: 247.20K (247199)
- SerializeBatchTime: 77.531ms
- TransmitDataRPCTime: 116.409ms
- UncompressedRowBatchSize: 15.77 MB (16540517)
CodeGen:(Total: 32.856ms, non-child: 32.856ms, % non-child: 100.00%)
- CodegenTime: 525.351us
- CompileTime: 5.299ms
- LoadTime: 0.000ns
- ModuleBitcodeSize: 1.91 MB (1997624)
- NumFunctions: 5 (5)
- NumInstructions: 98 (98)
- OptimizationTime: 7.499ms
- PeakMemoryUsage: 49.00 KB (50176)
- PrepareTime: 19.489ms
HDFS_SCAN_NODE (id=1):(Total: 8s089ms, non-child: 8s089ms, % non-child: 100.00%)
- AverageHdfsReadThreadConcurrency: 0.00
- AverageScannerThreadConcurrency: 15.00
- BytesRead: 21.85 MB (22909765)
- BytesReadDataNodeCache: 0
- BytesReadLocal: 21.85 MB (22909765)
- BytesReadRemoteUnexpected: 0
- BytesReadShortCircuit: 21.85 MB (22909765)
- DecompressionTime: 24.636ms
- MaxCompressedTextFileLength: 0
- NumColumns: 2 (2)
- NumDisksAccessed: 1 (1)
- NumRowGroups: 4.63K (4629)
- NumScannerThreadsStarted: 15 (15)
- PeakMemoryUsage: 3.79 MB (3973474)
- PerReadThreadRawHdfsThroughput: 302.95 MB/sec
- RemoteScanRanges: 0 (0)
- RowBatchQueueGetWaitTime: 8s014ms
- RowBatchQueuePutWaitTime: 0.000ns
- RowsRead: 247.20K (247199)
- RowsReturned: 247.20K (247199)
- RowsReturnedRate: 30.56 K/sec
- ScanRangesComplete: 4.63K (4629)
- ScannerThreadsInvoluntaryContextSwitches: 2 (2)
- ScannerThreadsTotalWallClockTime: 2m3s
- MaterializeTupleTime(*): 91.327ms
- ScannerThreadsSysTime: 122.410ms
- ScannerThreadsUserTime: 921.159ms
- ScannerThreadsVoluntaryContextSwitches: 25.21K (25206)
- TotalRawHdfsReadTime(*): 72.119ms
- TotalReadThroughput: 2.53 MB/sec
Fragment F01:
Instance 6e4732e641a91047:1 (host=lnx63.test.com:22000):(Total: 8s274ms, non-child: 0.000ns, % non-child: 0.00%)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:4629/23.33 MB
MemoryUsage(500.000ms): 515.12 KB, 427.12 KB, 2.17 MB, 461.12 KB, 556.12 KB, 584.12 KB, 625.12 KB, 420.12 KB, 353.12 KB, 515.12 KB, 423.12 KB, 468.12 KB, 543.12 KB, 341.12 KB, 584.12 KB, 433.12 KB, 379.12 KB
ThreadUsage(500.000ms): 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16
- AverageThreadTokens: 16.00
- BloomFilterBytes: 0
- PeakMemoryUsage: 3.83 MB (4019676)
- PerHostPeakMemUsage: 164.82 MB (172828949)
- RowsProduced: 247.20K (247199)
- TotalNetworkReceiveTime: 0.000ns
- TotalNetworkSendTime: 31.027ms
- TotalStorageWaitTime: 2m2s
- TotalThreadsInvoluntaryContextSwitches: 3 (3)
- TotalThreadsTotalWallClockTime: 2m11s
- TotalThreadsSysTime: 165.338ms
- TotalThreadsUserTime: 1s105ms
- TotalThreadsVoluntaryContextSwitches: 30.04K (30039)
Fragment Instance Lifecycle Timings:
- ExecTime: 8s226ms
- ExecTreeExecTime: 8s068ms
- OpenTime: 14.126ms
- ExecTreeOpenTime: 414.542us
- PrepareTime: 34.061ms
- ExecTreePrepareTime: 13.813ms
DataStreamSender (dst_id=4):(Total: 194.833ms, non-child: 194.833ms, % non-child: 100.00%)
- BytesSent: 7.33 MB (7682004)
- NetworkThroughput(*): 62.93 MB/sec
- OverallThroughput: 37.60 MB/sec
- PeakMemoryUsage: 3.88 KB (3968)
- RowsReturned: 247.20K (247199)
- SerializeBatchTime: 77.531ms
- TransmitDataRPCTime: 116.409ms
- UncompressedRowBatchSize: 15.77 MB (16540517)
CodeGen:(Total: 32.856ms, non-child: 32.856ms, % non-child: 100.00%)
- CodegenTime: 525.351us
- CompileTime: 5.299ms
- LoadTime: 0.000ns
- ModuleBitcodeSize: 1.91 MB (1997624)
- NumFunctions: 5 (5)
- NumInstructions: 98 (98)
- OptimizationTime: 7.499ms
- PeakMemoryUsage: 49.00 KB (50176)
- PrepareTime: 19.489ms
HDFS_SCAN_NODE (id=1):(Total: 8s089ms, non-child: 8s089ms, % non-child: 100.00%)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:4629/23.33 MB
ExecOption: PARQUET Codegen Enabled, Codegen enabled: 4629 out of 4629
Hdfs Read Thread Concurrency Bucket: 0:100% 1:0% 2:0% 3:0% 4:0% 5:0% 6:0%
File Formats: PARQUET/SNAPPY:9258
BytesRead(500.000ms): 120.38 KB, 1.66 MB, 3.36 MB, 4.57 MB, 6.11 MB, 7.28 MB, 8.45 MB, 9.63 MB, 10.80 MB, 12.44 MB, 13.98 MB, 15.19 MB, 16.32 MB, 17.51 MB, 18.59 MB, 20.43 MB, 21.54 MB
- FooterProcessingTime: (Avg: 13.074ms ; Min: 2.570ms ; Max: 26.071ms ; Number of samples: 4629)
- AverageHdfsReadThreadConcurrency: 0.00
- AverageScannerThreadConcurrency: 15.00
- BytesRead: 21.85 MB (22909765)
- BytesReadDataNodeCache: 0
- BytesReadLocal: 21.85 MB (22909765)
- BytesReadRemoteUnexpected: 0
- BytesReadShortCircuit: 21.85 MB (22909765)
- DecompressionTime: 24.636ms
- MaxCompressedTextFileLength: 0
- NumColumns: 2 (2)
- NumDisksAccessed: 1 (1)
- NumRowGroups: 4.63K (4629)
- NumScannerThreadsStarted: 15 (15)
- PeakMemoryUsage: 3.79 MB (3973474)
- PerReadThreadRawHdfsThroughput: 302.95 MB/sec
- RemoteScanRanges: 0 (0)
- RowBatchQueueGetWaitTime: 8s014ms
- RowBatchQueuePutWaitTime: 0.000ns
- RowsRead: 247.20K (247199)
- RowsReturned: 247.20K (247199)
- RowsReturnedRate: 30.56 K/sec
- ScanRangesComplete: 4.63K (4629)
- ScannerThreadsInvoluntaryContextSwitches: 2 (2)
- ScannerThreadsTotalWallClockTime: 2m3s
- MaterializeTupleTime(*): 91.327ms
- ScannerThreadsSysTime: 122.410ms
- ScannerThreadsUserTime: 921.159ms
- ScannerThreadsVoluntaryContextSwitches: 25.21K (25206)
- TotalRawHdfsReadTime(*): 72.119ms
- TotalReadThroughput: 2.53 MB/sec Below is the Memory information that I grab from Impalad Web UI for the select statement: Query(6e4732e641a91047:abdeb0b200000000): Limit=80.00 GB Total=48.02 GB Peak=48.02 GB
Fragment 6e4732e641a91047:abdeb0b200000000: Total=8.26 KB Peak=280.50 KB
EXCHANGE_NODE (id=5): Total=0 Peak=0
DataStreamRecvr: Total=0 Peak=0
PLAN_ROOT_SINK: Total=0 Peak=0
CodeGen: Total=266.00 B Peak=272.50 KB
Block Manager: Limit=48.00 GB Total=47.99 GB Peak=47.99 GB
Fragment 6e4732e641a91047:2: Total=48.02 GB Peak=48.02 GB
Runtime Filter Bank: Total=1.00 MB Peak=1.00 MB
SORT_NODE (id=3): Total=47.86 GB Peak=47.86 GB
HASH_JOIN_NODE (id=2): Total=154.09 MB Peak=162.11 MB
Hash Join Builder (join_node_id=2): Total=154.02 MB Peak=154.03 MB
HDFS_SCAN_NODE (id=0): Total=4.41 MB Peak=185.27 MB
EXCHANGE_NODE (id=4): Total=0 Peak=0
DataStreamRecvr: Total=0 Peak=2.25 MB
DataStreamSender (dst_id=5): Total=3.88 KB Peak=3.88 KB
CodeGen: Total=9.88 KB Peak=1.65 MB Below is the Summary information that I grab from Impalad Web UI for the select statement: Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
----------------------------------------------------------------------------------------------------------------------------
05:MERGING-EXCHANGE 1 0.000ns 0.000ns 0 635.58M 0 -1.00 B UNPARTITIONED
03:SORT 1 0.000ns 0.000ns 0 635.58M 47.86 GB 800.00 MB
02:HASH JOIN 1 4s859ms 4s859ms 771.02M 635.58M 162.11 MB 16.03 MB INNER JOIN, BROADCAST
|--04:EXCHANGE 1 38.988ms 38.988ms 247.20K 247.20K 0 0 BROADCAST
| 01:SCAN HDFS 1 8s089ms 8s089ms 247.20K 247.20K 3.79 MB 32.00 MB product b
00:SCAN HDFS 1 209.997ms 209.997ms 15.09M 635.58M 185.27 MB 176.00 MB sales a Could you advice me what is happening that caused the slow join query? Anyway to tune it beside adding more data nodes with Impala daemon install? Thanks a lot.
... View more
04-02-2017
09:54 PM
Hi All, I am running on a POC environment where there are only one name node and one data node running. Impala daemon is running on data node. Both of the nodes have 128GB memory each. I had set the mem_limit to 60GB. I had two big tables in Impala. First table (sales) has around 635 million records while second table (product) is around 250000 records. I inner join this 2 tables using a common parameter. The example SQL statement is as the following: select a.t_date, b.p_date from sales a inner join product b on a.product_id=b.product_id order by a.t_date desc The above query ran for a long time and after few hours, it still didn't return any result. When i use EXPLAIN, it showed Estimated Per-Host Requirements: Memory=992.03MB VCores=2. I am wondering why it took so long. Is this related to mem_limit settings? How can i tune such query?
... View more
Labels:
- Labels:
-
Apache Impala