Member since
01-25-2017
11
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3883 | 02-13-2017 07:27 AM |
05-26-2017
10:48 AM
Thank you sir for helping me walk through the profile. This is very informative.
... View more
05-26-2017
07:54 AM
I have this query: SELECT *
FROM ft_test, ft_wafer
WHERE ft_test_parquet.id = ft_wafer_parquet.id
and month = 1
and day = 8
and param = 2913; It took about 3.5s. I want this to be faster. I have ready used techniques like Parquet, partition, compute stats Here is the profile. Could someone look through this and find the bottleneck? Thanks. Query Runtime Profile:
Query (id=dc4c084615fbf9bb:4261466f00000000):
Summary:
Session ID: 5d4edbf63653cdf6:1a59ff5354c9d4bd
Session Type: BEESWAX
Start Time: 2017-05-25 16:31:25.121391000
End Time: 2017-05-25 16:31:28.584404000
Query Type: QUERY
Query State: FINISHED
Query Status: OK
Impala Version: impalad version 2.7.0 RELEASE (build a535b583202c4a81080098a10f952d377af1949d)
User: root
Connected User: root
Delegated User:
Network Address: ::ffff:127.0.0.1:58546
Default Db: inspex
Sql Statement: select *
FROM ft_test_partition, ft_wafer_parquet
WHERE ft_test_partition.file = ft_wafer_parquet.file
and month = 1
and day = 8
and param = 2913 limit 100
Coordinator: mapr1:22000
Query Options (non default):
Plan:
----------------
Estimated Per-Host Requirements: Memory=704.67MB VCores=2
04:EXCHANGE [UNPARTITIONED]
| limit: 100
| hosts=1 per-host-mem=unavailable
| tuple-ids=1,0 row-size=800B cardinality=1
|
02:HASH JOIN [INNER JOIN, BROADCAST]
| hash predicates: ft_wafer_parquet.file = ft_test_partition.file
| runtime filters: RF000 <- ft_test_partition.file
| limit: 100
| hosts=1 per-host-mem=690.00KB
| tuple-ids=1,0 row-size=800B cardinality=1
|
|--03:EXCHANGE [BROADCAST]
| | hosts=1 per-host-mem=0B
| | tuple-ids=0 row-size=78B cardinality=8235
| |
| 00:SCAN HDFS [inspex.ft_test_partition, RANDOM]
| partitions=1/29 files=1 size=171.69MB
| predicates: param = 2913
| table stats: 813365826 rows total
| column stats: all
| hosts=1 per-host-mem=704.00MB
| tuple-ids=0 row-size=78B cardinality=8235
|
01:SCAN HDFS [inspex.ft_wafer_parquet, RANDOM]
partitions=1/1 files=1 size=66.83KB
runtime filters: RF000 -> ft_wafer_parquet.file
table stats: 1500 rows total
column stats: all
hosts=1 per-host-mem=192.00MB
tuple-ids=1 row-size=722B cardinality=1500
----------------
Estimated Per-Host Mem: 738904067
Estimated Per-Host VCores: 2
Request Pool: default-pool
Admission result: Admitted immediately
ExecSummary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
----------------------------------------------------------------------------------------------------------------------
04:EXCHANGE 1 0.000ns 0.000ns 100 1 0 -1.00 B UNPARTITIONED
02:HASH JOIN 1 42.999ms 42.999ms 0 1 3.29 MB 690.00 KB INNER JOIN, BROADCAST
|--03:EXCHANGE 1 999.990us 999.990us 9.15K 8.23K 0 0 BROADCAST
| 00:SCAN HDFS 1 2s973ms 2s973ms 9.15K 8.23K 106.05 MB 704.00 MB inspex.ft_test_partition
01:SCAN HDFS 1 16.999ms 16.999ms 1.02K 1.50K 1.78 MB 192.00 MB inspex.ft_wafer_parquet
Planner Timeline: 15.315ms
- Analysis finished: 5.081ms (5.081ms)
- Equivalence classes computed: 5.601ms (519.374us)
- Single node plan created: 9.054ms (3.453ms)
- Runtime filters computed: 9.409ms (354.377us)
- Distributed plan created: 11.507ms (2.098ms)
- Planning finished: 15.315ms (3.808ms)
Query Timeline: 3s463ms
- Start execution: 0.000ns (0.000ns)
- Planning finished: 17.999ms (17.999ms)
- Submit for admission: 17.999ms (0.000ns)
- Completed admission: 17.999ms (0.000ns)
- Ready to start 2 remote fragments: 18.999ms (999.990us)
- All 2 remote fragments started: 19.999ms (999.990us)
- Rows available: 3s246ms (3s226ms)
- First row fetched: 3s346ms (99.999ms)
- Unregister query: 3s462ms (115.998ms)
- ComputeScanRangeAssignmentTimer: 0.000ns
ImpalaServer:
- ClientFetchWaitTimer: 214.997ms
- RowMaterializationTimer: 999.990us
Execution Profile dc4c084615fbf9bb:4261466f00000000:(Total: 3s228ms, non-child: 0.000ns, % non-child: 0.00%)
Number of filters: 1
Filter routing table:
ID Src. Node Tgt. Node(s) Targets Target type Partition filter Pending (Expected) First arrived Completed Enabled
----------------------------------------------------------------------------------------------------------------------------
0 2 1 1 LOCAL false 0 (1) N/A N/A true
Fragment start latencies: Count: 2, 25th %-ile: 0, 50th %-ile: 0, 75th %-ile: 1ms, 90th %-ile: 1ms, 95th %-ile: 1ms, 99.9th %-ile: 1ms
Final filter table:
ID Src. Node Tgt. Node(s) Targets Target type Partition filter Pending (Expected) First arrived Completed Enabled
----------------------------------------------------------------------------------------------------------------------------
0 2 1 1 LOCAL false 0 (1) N/A N/A true
Per Node Peak Memory Usage: mapr1:22000(108.65 MB)
- FiltersReceived: 0 (0)
- FinalizationTimer: 0.000ns
Coordinator Fragment F02:(Total: 3s226ms, non-child: 0.000ns, % non-child: 0.00%)
MemoryUsage(500.000ms): 16.00 KB, 16.00 KB, 16.00 KB, 16.00 KB, 16.00 KB, 16.00 KB, 16.00 KB
- AverageThreadTokens: 0.00
- BloomFilterBytes: 0
- PeakMemoryUsage: 209.83 KB (214864)
- PerHostPeakMemUsage: 0
- PrepareTime: 0.000ns
- RowsProduced: 0 (0)
- TotalCpuTime: 101.999ms
- TotalNetworkReceiveTime: 3s226ms
- TotalNetworkSendTime: 0.000ns
- TotalStorageWaitTime: 0.000ns
BlockMgr:
- BlockWritesOutstanding: 0 (0)
- BlocksCreated: 48 (48)
- BlocksRecycled: 0 (0)
- BufferedPins: 0 (0)
- BytesWritten: 0
- MaxBlockSize: 8.00 MB (8388608)
- MemoryLimit: 12.21 GB (13111148544)
- PeakMemoryUsage: 256.00 KB (262144)
- TotalBufferWaitTime: 0.000ns
- TotalEncryptionTime: 0.000ns
- TotalIntegrityCheckTime: 0.000ns
- TotalReadBlockTime: 0.000ns
EXCHANGE_NODE (id=4):(Total: 3s226ms, non-child: 0.000ns, % non-child: 0.00%)
BytesReceived(500.000ms): 0, 0, 0, 0, 0, 0, 0
- BytesReceived: 61.05 KB (62513)
- ConvertRowBatchTime: 0.000ns
- DeserializeRowBatchTimer: 0.000ns
- FirstBatchArrivalWaitTime: 3s226ms
- PeakMemoryUsage: 0
- RowsReturned: 100 (100)
- RowsReturnedRate: 30.00 /sec
- SendersBlockedTimer: 0.000ns
- SendersBlockedTotalTimer(*): 0.000ns
Averaged Fragment F00:(Total: 3s001ms, non-child: 0.000ns, % non-child: 0.00%)
split sizes: min: 66.83 KB, max: 66.83 KB, avg: 66.83 KB, stddev: 0
completion times: min:3s227ms max:3s227ms mean: 3s227ms stddev:0.000ns
execution rates: min:20.70 KB/sec max:20.70 KB/sec mean:20.70 KB/sec stddev:0.00 /sec
num instances: 1
- AverageThreadTokens: 1.86
- BloomFilterBytes: 1.00 MB (1048576)
- PeakMemoryUsage: 5.07 MB (5320864)
- PerHostPeakMemUsage: 108.65 MB (113924736)
- PrepareTime: 38.999ms
- RowsProduced: 1.02K (1024)
- TotalCpuTime: 3s232ms
- TotalNetworkReceiveTime: 2s940ms
- TotalNetworkSendTime: 0.000ns
- TotalStorageWaitTime: 13.999ms
CodeGen:(Total: 262.997ms, non-child: 262.997ms, % non-child: 100.00%)
- CodegenTime: 999.990us
- CompileTime: 73.999ms
- LoadTime: 0.000ns
- ModuleBitcodeSize: 1.86 MB (1953028)
- NumFunctions: 85 (85)
- NumInstructions: 2.86K (2857)
- OptimizationTime: 151.998ms
- PrepareTime: 36.999ms
DataStreamSender (dst_id=4):(Total: 999.990us, non-child: 999.990us, % non-child: 100.00%)
- BytesSent: 61.05 KB (62513)
- NetworkThroughput(*): 0.00 /sec
- OverallThroughput: 59.62 MB/sec
- RowsReturned: 1.02K (1024)
- SerializeBatchTime: 999.990us
- TransmitDataRPCTime: 0.000ns
- UncompressedRowBatchSize: 185.83 KB (190290)
HASH_JOIN_NODE (id=2):(Total: 3s001ms, non-child: 42.999ms, % non-child: 1.43%)
- BuildPartitionTime: 1.999ms
- BuildRows: 9.15K (9153)
- BuildRowsPartitioned: 9.15K (9153)
- BuildTime: 0.000ns
- GetNewBlockTime: 0.000ns
- HashBuckets: 16.38K (16384)
- HashCollisions: 0 (0)
- LargestPartitionPercent: 6 (6)
- MaxPartitionLevel: 0 (0)
- NumRepartitions: 0 (0)
- PartitionsCreated: 16 (16)
- PeakMemoryUsage: 3.29 MB (3445888)
- PinTime: 0.000ns
- ProbeRows: 1.02K (1024)
- ProbeRowsPartitioned: 0 (0)
- ProbeTime: 0.000ns
- RowsReturned: 0 (0)
- RowsReturnedRate: 0
- SpilledPartitions: 0 (0)
- UnpinTime: 0.000ns
EXCHANGE_NODE (id=3):(Total: 2s941ms, non-child: 2s941ms, % non-child: 100.00%)
- BytesReceived: 314.85 KB (322407)
- ConvertRowBatchTime: 0.000ns
- DeserializeRowBatchTimer: 0.000ns
- FirstBatchArrivalWaitTime: 0.000ns
- PeakMemoryUsage: 0
- RowsReturned: 9.15K (9153)
- RowsReturnedRate: 3.11 K/sec
- SendersBlockedTimer: 0.000ns
- SendersBlockedTotalTimer(*): 0.000ns
HDFS_SCAN_NODE (id=1):(Total: 16.999ms, non-child: 16.999ms, % non-child: 100.00%)
- AverageHdfsReadThreadConcurrency: 0.00
- AverageScannerThreadConcurrency: 0.00
- BytesRead: 128.51 KB (131593)
- BytesReadDataNodeCache: 0
- BytesReadLocal: 0
- BytesReadRemoteUnexpected: 0
- BytesReadShortCircuit: 0
- DecompressionTime: 0.000ns
- MaxCompressedTextFileLength: 0
- NumColumns: 49 (49)
- NumDisksAccessed: 0 (0)
- NumRowGroups: 1 (1)
- NumScannerThreadsStarted: 1 (1)
- PeakMemoryUsage: 1.78 MB (1866400)
- PerReadThreadRawHdfsThroughput: 4.18 MB/sec
- RemoteScanRanges: 0 (0)
- RowsRead: 1.50K (1500)
- RowsReturned: 1.02K (1024)
- RowsReturnedRate: 60.23 K/sec
- ScanRangesComplete: 1 (1)
- ScannerThreadsInvoluntaryContextSwitches: 0 (0)
- ScannerThreadsTotalWallClockTime: 14.999ms
- MaterializeTupleTime(*): 999.990us
- ScannerThreadsSysTime: 0.000ns
- ScannerThreadsUserTime: 2.216ms
- ScannerThreadsVoluntaryContextSwitches: 27 (27)
- TotalRawHdfsReadTime(*): 29.999ms
- TotalReadThroughput: 0.00 /sec
Filter 0 (1.00 MB):
- Rows processed: 1.50K (1500)
- Rows rejected: 3 (3)
- Rows total: 1.50K (1500)
Averaged Fragment F01:(Total: 3s191ms, non-child: 160.998ms, % non-child: 5.04%)
split sizes: min: 171.69 MB, max: 171.69 MB, avg: 171.69 MB, stddev: 0
completion times: min:3s210ms max:3s210ms mean: 3s210ms stddev:0.000ns
execution rates: min:53.47 MB/sec max:53.47 MB/sec mean:53.47 MB/sec stddev:0.00 /sec
num instances: 1
- AverageThreadTokens: 1.86
- BloomFilterBytes: 0
- PeakMemoryUsage: 106.05 MB (111206232)
- PerHostPeakMemUsage: 108.65 MB (113924736)
- PrepareTime: 33.999ms
- RowsProduced: 9.15K (9153)
- TotalCpuTime: 6s330ms
- TotalNetworkReceiveTime: 0.000ns
- TotalNetworkSendTime: 0.000ns
- TotalStorageWaitTime: 36.999ms
CodeGen:(Total: 51.999ms, non-child: 51.999ms, % non-child: 100.00%)
- CodegenTime: 999.990us
- CompileTime: 5.999ms
- LoadTime: 0.000ns
- ModuleBitcodeSize: 1.86 MB (1953028)
- NumFunctions: 13 (13)
- NumInstructions: 228 (228)
- OptimizationTime: 11.999ms
- PrepareTime: 33.999ms
DataStreamSender (dst_id=3):(Total: 4.999ms, non-child: 4.999ms, % non-child: 100.00%)
- BytesSent: 314.85 KB (322407)
- NetworkThroughput(*): 153.74 MB/sec
- OverallThroughput: 61.49 MB/sec
- RowsReturned: 9.15K (9153)
- SerializeBatchTime: 2.999ms
- TransmitDataRPCTime: 1.999ms
- UncompressedRowBatchSize: 769.68 KB (788150)
HDFS_SCAN_NODE (id=0):(Total: 2s973ms, non-child: 2s973ms, % non-child: 100.00%)
- AverageHdfsReadThreadConcurrency: 0.00
- AverageScannerThreadConcurrency: 1.00
- BytesRead: 171.79 MB (180132958)
- BytesReadDataNodeCache: 0
- BytesReadLocal: 0
- BytesReadRemoteUnexpected: 0
- BytesReadShortCircuit: 0
- DecompressionTime: 130.998ms
- MaxCompressedTextFileLength: 0
- NumColumns: 8 (8)
- NumDisksAccessed: 1 (1)
- NumRowGroups: 1 (1)
- NumScannerThreadsStarted: 1 (1)
- PeakMemoryUsage: 106.05 MB (111196408)
- PerReadThreadRawHdfsThroughput: 434.91 MB/sec
- RemoteScanRanges: 0 (0)
- RowsRead: 28.05M (28047320)
- RowsReturned: 9.15K (9153)
- RowsReturnedRate: 3.08 K/sec
- ScanRangesComplete: 1 (1)
- ScannerThreadsInvoluntaryContextSwitches: 112 (112)
- ScannerThreadsTotalWallClockTime: 3s157ms
- MaterializeTupleTime(*): 2s977ms
- ScannerThreadsSysTime: 566.243ms
- ScannerThreadsUserTime: 2s525ms
- ScannerThreadsVoluntaryContextSwitches: 100 (100)
- TotalRawHdfsReadTime(*): 394.996ms
- TotalReadThroughput: 57.11 MB/sec
Fragment F00:
Instance dc4c084615fbf9bb:4261466f00000001 (host=mapr1:22000):(Total: 3s001ms, non-child: 0.000ns, % non-child: 0.00%)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:1/66.83 KB
Filter 0 arrival: 3s210ms
MemoryUsage(500.000ms): 0, 3.05 MB, 3.14 MB, 3.13 MB, 3.09 MB, 3.12 MB, 3.19 MB
ThreadUsage(500.000ms): 1, 2, 2, 2, 2, 2, 2
- AverageThreadTokens: 1.86
- BloomFilterBytes: 1.00 MB (1048576)
- PeakMemoryUsage: 5.07 MB (5320864)
- PerHostPeakMemUsage: 108.65 MB (113924736)
- PrepareTime: 38.999ms
- RowsProduced: 1.02K (1024)
- TotalCpuTime: 3s232ms
- TotalNetworkReceiveTime: 2s940ms
- TotalNetworkSendTime: 0.000ns
- TotalStorageWaitTime: 13.999ms
CodeGen:(Total: 262.997ms, non-child: 262.997ms, % non-child: 100.00%)
- CodegenTime: 999.990us
- CompileTime: 73.999ms
- LoadTime: 0.000ns
- ModuleBitcodeSize: 1.86 MB (1953028)
- NumFunctions: 85 (85)
- NumInstructions: 2.86K (2857)
- OptimizationTime: 151.998ms
- PrepareTime: 36.999ms
DataStreamSender (dst_id=4):(Total: 999.990us, non-child: 999.990us, % non-child: 100.00%)
- BytesSent: 61.05 KB (62513)
- NetworkThroughput(*): 0.00 /sec
- OverallThroughput: 59.62 MB/sec
- RowsReturned: 1.02K (1024)
- SerializeBatchTime: 999.990us
- TransmitDataRPCTime: 0.000ns
- UncompressedRowBatchSize: 185.83 KB (190290)
HASH_JOIN_NODE (id=2):(Total: 3s001ms, non-child: 42.999ms, % non-child: 1.43%)
ExecOption: Build Side Codegen Enabled, Probe Side Codegen Enabled, Hash Table Construction Codegen Enabled, Join Build-Side Prepared Asynchronously, 1 of 1 Runtime Filter Published
- BuildPartitionTime: 1.999ms
- BuildRows: 9.15K (9153)
- BuildRowsPartitioned: 9.15K (9153)
- BuildTime: 0.000ns
- GetNewBlockTime: 0.000ns
- HashBuckets: 16.38K (16384)
- HashCollisions: 0 (0)
- LargestPartitionPercent: 6 (6)
- MaxPartitionLevel: 0 (0)
- NumRepartitions: 0 (0)
- PartitionsCreated: 16 (16)
- PeakMemoryUsage: 3.29 MB (3445888)
- PinTime: 0.000ns
- ProbeRows: 1.02K (1024)
- ProbeRowsPartitioned: 0 (0)
- ProbeTime: 0.000ns
- RowsReturned: 0 (0)
- RowsReturnedRate: 0
- SpilledPartitions: 0 (0)
- UnpinTime: 0.000ns
EXCHANGE_NODE (id=3):(Total: 2s941ms, non-child: 999.990us, % non-child: 0.03%)
BytesReceived(500.000ms): 70.08 KB, 127.46 KB, 162.66 KB, 230.08 KB, 301.42 KB, 312.19 KB
- BytesReceived: 314.85 KB (322407)
- ConvertRowBatchTime: 0.000ns
- DeserializeRowBatchTimer: 0.000ns
- FirstBatchArrivalWaitTime: 0.000ns
- PeakMemoryUsage: 0
- RowsReturned: 9.15K (9153)
- RowsReturnedRate: 3.11 K/sec
- SendersBlockedTimer: 0.000ns
- SendersBlockedTotalTimer(*): 0.000ns
HDFS_SCAN_NODE (id=1):(Total: 16.999ms, non-child: 16.999ms, % non-child: 100.00%)
ExecOption: Expr Evaluation Codegen Disabled, PARQUET Codegen Enabled
Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:1/66.83 KB
Runtime filters: All filters arrived. Waited 0
BytesRead(500.000ms): 0, 0, 0, 0, 0, 0
- AverageHdfsReadThreadConcurrency: 0.00
- AverageScannerThreadConcurrency: 0.00
- BytesRead: 128.51 KB (131593)
- BytesReadDataNodeCache: 0
- BytesReadLocal: 0
- BytesReadRemoteUnexpected: 0
- BytesReadShortCircuit: 0
- DecompressionTime: 0.000ns
- MaxCompressedTextFileLength: 0
- NumColumns: 49 (49)
- NumDisksAccessed: 0 (0)
- NumRowGroups: 1 (1)
- NumScannerThreadsStarted: 1 (1)
- PeakMemoryUsage: 1.78 MB (1866400)
- PerReadThreadRawHdfsThroughput: 4.18 MB/sec
- RemoteScanRanges: 0 (0)
- RowsRead: 1.50K (1500)
- RowsReturned: 1.02K (1024)
- RowsReturnedRate: 60.23 K/sec
- ScanRangesComplete: 1 (1)
- ScannerThreadsInvoluntaryContextSwitches: 0 (0)
- ScannerThreadsTotalWallClockTime: 14.999ms
- MaterializeTupleTime(*): 999.990us
- ScannerThreadsSysTime: 0.000ns
- ScannerThreadsUserTime: 2.216ms
- ScannerThreadsVoluntaryContextSwitches: 27 (27)
- TotalRawHdfsReadTime(*): 29.999ms
- TotalReadThroughput: 0.00 /sec
Filter 0 (1.00 MB):
- Rows processed: 1.50K (1500)
- Rows rejected: 3 (3)
- Rows total: 1.50K (1500)
Fragment F01:
Instance dc4c084615fbf9bb:4261466f00000002 (host=mapr1:22000):(Total: 3s191ms, non-child: 160.998ms, % non-child: 5.04%)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:1/171.69 MB
MemoryUsage(500.000ms): 0, 91.50 MB, 82.91 MB, 86.63 MB, 67.68 MB, 66.67 MB, 53.51 MB
ThreadUsage(500.000ms): 1, 2, 2, 2, 2, 2, 2
- AverageThreadTokens: 1.86
- BloomFilterBytes: 0
- PeakMemoryUsage: 106.05 MB (111206232)
- PerHostPeakMemUsage: 108.65 MB (113924736)
- PrepareTime: 33.999ms
- RowsProduced: 9.15K (9153)
- TotalCpuTime: 6s330ms
- TotalNetworkReceiveTime: 0.000ns
- TotalNetworkSendTime: 0.000ns
- TotalStorageWaitTime: 36.999ms
CodeGen:(Total: 51.999ms, non-child: 51.999ms, % non-child: 100.00%)
- CodegenTime: 999.990us
- CompileTime: 5.999ms
- LoadTime: 0.000ns
- ModuleBitcodeSize: 1.86 MB (1953028)
- NumFunctions: 13 (13)
- NumInstructions: 228 (228)
- OptimizationTime: 11.999ms
- PrepareTime: 33.999ms
DataStreamSender (dst_id=3):(Total: 4.999ms, non-child: 4.999ms, % non-child: 100.00%)
- BytesSent: 314.85 KB (322407)
- NetworkThroughput(*): 153.74 MB/sec
- OverallThroughput: 61.49 MB/sec
- RowsReturned: 9.15K (9153)
- SerializeBatchTime: 2.999ms
- TransmitDataRPCTime: 1.999ms
- UncompressedRowBatchSize: 769.68 KB (788150)
HDFS_SCAN_NODE (id=0):(Total: 2s973ms, non-child: 2s973ms, % non-child: 100.00%)
ExecOption: Expr Evaluation Codegen Enabled, PARQUET Codegen Enabled, Codegen enabled: 1 out of 1
Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:1/171.69 MB
Hdfs Read Thread Concurrency Bucket: 0:100% 1:0% 2:0% 3:0% 4:0% 5:0%
File Formats: PARQUET/SNAPPY:8
BytesRead(500.000ms): 88.45 MB, 112.45 MB, 136.45 MB, 152.66 MB, 168.66 MB, 171.79 MB
- AverageHdfsReadThreadConcurrency: 0.00
- AverageScannerThreadConcurrency: 1.00
- BytesRead: 171.79 MB (180132958)
- BytesReadDataNodeCache: 0
- BytesReadLocal: 0
- BytesReadRemoteUnexpected: 0
- BytesReadShortCircuit: 0
- DecompressionTime: 130.998ms
- MaxCompressedTextFileLength: 0
- NumColumns: 8 (8)
- NumDisksAccessed: 1 (1)
- NumRowGroups: 1 (1)
- NumScannerThreadsStarted: 1 (1)
- PeakMemoryUsage: 106.05 MB (111196408)
- PerReadThreadRawHdfsThroughput: 434.91 MB/sec
- RemoteScanRanges: 0 (0)
- RowsRead: 28.05M (28047320)
- RowsReturned: 9.15K (9153)
- RowsReturnedRate: 3.08 K/sec
- ScanRangesComplete: 1 (1)
- ScannerThreadsInvoluntaryContextSwitches: 112 (112)
- ScannerThreadsTotalWallClockTime: 3s157ms
- MaterializeTupleTime(*): 2s977ms
- ScannerThreadsSysTime: 566.243ms
- ScannerThreadsUserTime: 2s525ms
- ScannerThreadsVoluntaryContextSwitches: 100 (100)
- TotalRawHdfsReadTime(*): 394.996ms
- TotalReadThroughput: 57.11 MB/sec
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Impala
-
HDFS
02-17-2017
12:45 PM
Hi Anis Since Impala can use Hive's metadata. I think it's ok to use the command to import data from RDBMS to Hive and use impala to query it.
... View more
02-07-2017
09:25 AM
So I was trying to partition my Impala table with the column 'file' which has 1500 distinct records. That means 1500 partitions. I first run a query like this to return the partition queries: SELECT DISTINCT
concat('insert into partitioned_table partition (year=',
cast(year as string),', month=',cast(month as string),
') select c1, c2, c3 from raw_data where year=',
cast(year as string),' and month=',cast(month as string),';') AS command
FROM raw_data; Then I got 1500 queries to run. Now there's one problem: Since each query might take 3 minutes to finish. 1500 queries could take several days. Which is a really long time. To save the time, I have already done some tuning: using COMPUTE STATS to get statics, convert table to Parquet. MY question is, Is there a way that can speed up this process? Like max up the executors just like Hive can do? My Cluster: 3 nodes, each with 6 core cpu, 32GB RAM and 1.8T hard disk Any thoughts would be helpful, Thank you!
... View more
Labels:
- Labels:
-
Apache Impala
-
HDFS
01-25-2017
11:12 AM
Hi guys, So I was trying to switch Hive engine from MR to Spark. I follow the like here . But I didn't see any Spark service name in the Spark On YARN Service under Configuration tab. Notice that there's already a gateway on running HiveServer2. Another thing worth to notice is that Spark 2 is manually installed. Here are some snapshots. Any input would be helpful. Thanks!
... View more
Labels:
- Labels:
-
Apache Spark
-
Cloudera Manager