Member since
07-29-2015
535
Posts
141
Kudos Received
103
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
6181 | 12-18-2020 01:46 PM | |
4030 | 12-16-2020 12:11 PM | |
2869 | 12-07-2020 01:47 PM | |
2026 | 12-07-2020 09:21 AM | |
1301 | 10-14-2020 11:15 AM |
04-17-2019
06:00 PM
1 Kudo
In it's default configuration, metadata is cached until an "INVALIDATE METADATA" command evicts the table from the cache. Or until the catalog is restarted. In 5.16 and 6.1+ there are some non-default options that will evict metadata after a particular timeout. At some point these will become the defaults. Table stats are collected and stored in the hive metastore when you run a "compute stats" command. They are then just part of the table metadata.
... View more
04-10-2019
05:02 AM
Thank you very much Tim for providing this insight. I have assumption that MEM_LIMIT option is asking for that amount of space for query.
... View more
04-08-2019
12:22 PM
If you want exact precision to a number of decimal digits, I'd recommend using the DECIMAL data type. Floating point can't exactly represent decimal numbers. If you're returning a floating point type from a query, then you don't have any real control over the display because it's basically a client-side formatting decision.E.g. your Java code that uses the JDBC drive could take the value and format it however it wants. You *might* be able to get the desired behaviour in impala-shell by using the round() function but that depends on some undocumented behaviour. I'd recommend looking at decimal.
... View more
04-05-2019
08:54 AM
If you have more than a handful of users it becomes difficult to manage the large number of pools. Resource limits are also of limited use - you can limit the total consumption per user, but you can't guarantee that any group of users gets memory.
... View more
03-26-2019
10:19 AM
Impala expect your UDF code and dependencies to be in a single .so, so you'd have to statically link any libraries you depend on.
... View more
03-14-2019
11:56 AM
Hi @Tim Armstrong , This is the output of SHOW FILES on the specific partition the query failed on (it failed on) hdfs://HadoopCluster/user/database/table_name/partition_value=KS5021/part-m-00000.snappy 2.74GB partition_value=KS5021 hdfs://HadoopCluster/user/database/table_name/partition_value=KS5021/part-m-00001.snappy 3.20GB partition_value=KS5021 hdfs://HadoopCluster/user/database/table_name/partition_value=KS5021/part-m-00002.snappy 3.55GB partition_value=KS5021 hdfs://HadoopCluster/user/database/table_name/partition_value=KS5021/part-m-00003.snappy 3.19GB partition_value=KS5021 This is the version: impalad version 2.12.0-cdh5.15.1 RELEASE (build 64f4e19bf59fab8664ebff7e80fc70570dcd8cb8) Built on Thu Aug 9 09:21:02 PDT 2018 Thanks
... View more
03-14-2019
11:01 AM
Hi Tim, I am using parquet format for the table. i also had tried "set PARQUET_FALLBACK_SCHEMA_RESOLUTION=name " before running the query but same result. please see the profile below : Query (id=e54f7da15a77d3d0:342167b500000000): Summary: Session ID: 734733b810bda5d6:61715265a5d564b8 Session Type: BEESWAX Start Time: 2019-03-14 13:37:44.723524000 End Time: 2019-03-14 13:37:45.723116000 Query Type: QUERY Query State: FINISHED Query Status: OK Impala Version: impalad version 2.7.0-cdh5.9.0 RELEASE (build 4b4cf1936bd6cdf34fda5e2f32827e7d60c07a9c) User: usr_impala Connected User: usr_impala Delegated User: Network Address: 153.40.73.237:47596 Default Db: default Sql Statement: select * from comm_status where serverdate='2019-03-02' and report_reference_number='CITIXLME549300DV4DLC540WV917GB00D3CMMY07020190301' Coordinator: cloud_machine_1:22000 Query Options (non default): Plan: ---------------- Estimated Per-Host Requirements: Memory=48.00MB VCores=1 01:EXCHANGE [UNPARTITIONED] | hosts=4 per-host-mem=unavailable | tuple-ids=0 row-size=177B cardinality=2 | 00:SCAN HDFS [default.comm_status, RANDOM] partitions=6/16 files=6 size=6.96MB predicates: report_reference_number = 'CITIXLME549300DV4DLC540WV917GB00D3CMMY07020190301' table stats: 166968 rows total column stats: all hosts=4 per-host-mem=48.00MB tuple-ids=0 row-size=177B cardinality=2 ---------------- Estimated Per-Host Mem: 50331648 Estimated Per-Host VCores: 1 Request Pool: default-pool Admission result: Admitted immediately ExecSummary: Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail ------------------------------------------------------------------------------------------------------------------- 01:EXCHANGE 1 64.697us 64.697us 6 2 0 -1.00 B UNPARTITIONED 00:SCAN HDFS 4 594.658ms 960.960ms 6 2 6.27 MB 48.00 MB default.comm_status Planner Timeline: 2.019ms - Analysis finished: 521.731us (521.731us) - Equivalence classes computed: 614.107us (92.376us) - Single node plan created: 1.340ms (726.439us) - Runtime filters computed: 1.357ms (17.135us) - Distributed plan created: 1.637ms (279.515us) - Planning finished: 2.019ms (382.286us) Query Timeline: 1s001ms - Start execution: 46.572us (46.572us) - Planning finished: 2.951ms (2.904ms) - Submit for admission: 3.103ms (152.472us) - Completed admission: 3.211ms (107.528us) - Ready to start 4 remote fragments: 3.562ms (351.656us) - All 4 remote fragments started: 8.496ms (4.933ms) - Rows available: 272.403ms (263.906ms) - First row fetched: 310.582ms (38.179ms) - Unregister query: 999.597ms (689.015ms) - ComputeScanRangeAssignmentTimer: 33.622us ImpalaServer: - ClientFetchWaitTimer: 40.960ms - RowMaterializationTimer: 16.160us Execution Profile e54f7da15a77d3d0:342167b500000000:(Total: 955.209ms, non-child: 0.000ns, % non-child: 0.00%) Number of filters: 0 Filter routing table: ID Src. Node Tgt. Node(s) Targets Target type Partition filter Pending (Expected) First arrived Completed Enabled ---------------------------------------------------------------------------------------------------------------------------- Fragment start latencies: Count: 4, 25th %-ile: 1ms, 50th %-ile: 1ms, 75th %-ile: 1ms, 90th %-ile: 4ms, 95th %-ile: 4ms, 99.9th %-ile: 4ms Per Node Peak Memory Usage: cloud_machine_2:22000(6.28 MB) cloud_machine_3:22000(4.17 MB) cloud_machine_4:22000(4.30 MB) cloud_machine_1:22000(4.20 MB) - FiltersReceived: 0 (0) - FinalizationTimer: 0.000ns Coordinator Fragment F01:(Total: 949.221ms, non-child: 241.396us, % non-child: 0.03%) MemoryUsage(500.000ms): 8.00 KB, 24.01 KB - AverageThreadTokens: 0.00 - BloomFilterBytes: 0 - PeakMemoryUsage: 32.02 KB (32784) - PerHostPeakMemUsage: 0 - PrepareTime: 26.430us - RowsProduced: 0 (0) - TotalCpuTime: 43.717ms - TotalNetworkReceiveTime: 948.955ms - TotalNetworkSendTime: 0.000ns - TotalStorageWaitTime: 0.000ns BlockMgr: - BlockWritesOutstanding: 0 (0) - BlocksCreated: 0 (0) - BlocksRecycled: 0 (0) - BufferedPins: 0 (0) - BytesWritten: 0 - MaxBlockSize: 8.00 MB (8388608) - MemoryLimit: 68.92 GB (74007330816) - PeakMemoryUsage: 0 - TotalBufferWaitTime: 0.000ns - TotalEncryptionTime: 0.000ns - TotalIntegrityCheckTime: 0.000ns - TotalReadBlockTime: 0.000ns EXCHANGE_NODE (id=1):(Total: 948.980ms, non-child: 64.697us, % non-child: 0.01%) BytesReceived(500.000ms): 0, 716.00 B - BytesReceived: 1.05 KB (1072) - ConvertRowBatchTime: 9.082us - DeserializeRowBatchTimer: 42.542us - FirstBatchArrivalWaitTime: 263.623ms - PeakMemoryUsage: 0 - RowsReturned: 6 (6) - RowsReturnedRate: 6.00 /sec - SendersBlockedTimer: 0.000ns - SendersBlockedTotalTimer(*): 0.000ns Averaged Fragment F00:(Total: 595.899ms, non-child: 0.000ns, % non-child: 0.00%) split sizes: min: 1.16 MB, max: 2.32 MB, avg: 1.74 MB, stddev: 594.15 KB completion times: min:328.847ms max:992.298ms mean: 627.859ms stddev:300.544ms execution rates: min:1.17 MB/sec max:6.96 MB/sec mean:3.59 MB/sec stddev:2.12 MB/sec num instances: 4 - AverageThreadTokens: 1.25 - BloomFilterBytes: 0 - PeakMemoryUsage: 4.73 MB (4959952) - PerHostPeakMemUsage: 4.74 MB (4967122) - PrepareTime: 37.204ms - RowsProduced: 1 (1) - TotalCpuTime: 704.988ms - TotalNetworkReceiveTime: 0.000ns - TotalNetworkSendTime: 170.458us - TotalStorageWaitTime: 665.723ms BlockMgr: - BlockWritesOutstanding: 0 (0) - BlocksCreated: 0 (0) - BlocksRecycled: 0 (0) - BufferedPins: 0 (0) - BytesWritten: 0 - MaxBlockSize: 8.00 MB (8388608) - MemoryLimit: 68.92 GB (74007330816) - PeakMemoryUsage: 0 - TotalBufferWaitTime: 0.000ns - TotalEncryptionTime: 0.000ns - TotalIntegrityCheckTime: 0.000ns - TotalReadBlockTime: 0.000ns CodeGen:(Total: 67.096ms, non-child: 67.096ms, % non-child: 100.00%) - CodegenTime: 1.170ms - CompileTime: 6.799ms - LoadTime: 0.000ns - ModuleBitcodeSize: 1.86 MB (1953124) - NumFunctions: 21 (21) - NumInstructions: 333 (333) - OptimizationTime: 23.416ms - PrepareTime: 36.447ms DataStreamSender (dst_id=1):(Total: 241.040us, non-child: 241.040us, % non-child: 100.00%) - BytesSent: 268.00 B (268) - NetworkThroughput(*): 3.94 MB/sec - OverallThroughput: 2.30 MB/sec - RowsReturned: 1 (1) - SerializeBatchTime: 18.028us - TransmitDataRPCTime: 205.339us - UncompressedRowBatchSize: 331.00 B (331) HDFS_SCAN_NODE (id=0):(Total: 594.658ms, non-child: 594.658ms, % non-child: 100.00%) - AverageHdfsReadThreadConcurrency: 0.75 - AverageScannerThreadConcurrency: 0.75 - BytesRead: 1.89 MB (1977967) - BytesReadDataNodeCache: 0 - BytesReadLocal: 0 - BytesReadRemoteUnexpected: 0 - BytesReadShortCircuit: 0 - DecompressionTime: 0.000ns - MaxCompressedTextFileLength: 0 - NumColumns: 3 (3) - NumDisksAccessed: 1 (1) - NumRowGroups: 1 (1) - NumScannerThreadsStarted: 1 (1) - PeakMemoryUsage: 4.72 MB (4950272) - PerReadThreadRawHdfsThroughput: 4.56 MB/sec - RemoteScanRanges: 6 (6) - RowsRead: 18.81K (18807) - RowsReturned: 1 (1) - RowsReturnedRate: 3.00 /sec - ScanRangesComplete: 1 (1) - ScannerThreadsInvoluntaryContextSwitches: 2 (2) - ScannerThreadsTotalWallClockTime: 668.590ms - MaterializeTupleTime(*): 2.253ms - ScannerThreadsSysTime: 499.500us - ScannerThreadsUserTime: 1.748ms - ScannerThreadsVoluntaryContextSwitches: 9 (9) - TotalRawHdfsReadTime(*): 601.438ms - TotalReadThroughput: 963.39 KB/sec Fragment F00: Instance e54f7da15a77d3d0:342167b500000002 (host=cloud_machine_1:22000):(Total: 961.567ms, non-child: 0.000ns, % non-child: 0.00%) Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:1/1.16 MB MemoryUsage(500.000ms): 4.00 KB, 2.12 MB ThreadUsage(500.000ms): 1, 2 - AverageThreadTokens: 1.50 - BloomFilterBytes: 0 - PeakMemoryUsage: 4.17 MB (4372944) - PerHostPeakMemUsage: 4.20 MB (4401624) - PrepareTime: 35.478ms - RowsProduced: 1 (1) - TotalCpuTime: 993.973ms - TotalNetworkReceiveTime: 0.000ns - TotalNetworkSendTime: 29.018us - TotalStorageWaitTime: 923.149ms CodeGen:(Total: 64.442ms, non-child: 64.442ms, % non-child: 100.00%) - CodegenTime: 832.912us - CompileTime: 6.441ms - LoadTime: 0.000ns - ModuleBitcodeSize: 1.86 MB (1953124) - NumFunctions: 21 (21) - NumInstructions: 333 (333) - OptimizationTime: 22.658ms - PrepareTime: 34.912ms DataStreamSender (dst_id=1):(Total: 65.548us, non-child: 65.548us, % non-child: 100.00%) - BytesSent: 176.00 B (176) - NetworkThroughput(*): 4.71 MB/sec - OverallThroughput: 2.56 MB/sec - RowsReturned: 1 (1) - SerializeBatchTime: 11.148us - TransmitDataRPCTime: 35.622us - UncompressedRowBatchSize: 219.00 B (219) HDFS_SCAN_NODE (id=0):(Total: 960.960ms, non-child: 960.960ms, % non-child: 100.00%) ExecOption: Expr Evaluation Codegen Disabled, PARQUET Codegen Enabled, Codegen enabled: 1 out of 1 Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:1/1.16 MB Hdfs Read Thread Concurrency Bucket: 0:0% 1:100% 2:0% 3:0% 4:0% 5:0% 6:0% 7:0% File Formats: PARQUET/NONE:3 BytesRead(500.000ms): 0, 1.26 MB - AverageHdfsReadThreadConcurrency: 1.00 - AverageScannerThreadConcurrency: 1.00 - BytesRead: 1.26 MB (1318689) - BytesReadDataNodeCache: 0 - BytesReadLocal: 0 - BytesReadRemoteUnexpected: 0 - BytesReadShortCircuit: 0 - DecompressionTime: 0.000ns - MaxCompressedTextFileLength: 0 - NumColumns: 3 (3) - NumDisksAccessed: 1 (1) - NumRowGroups: 1 (1) - NumScannerThreadsStarted: 1 (1) - PeakMemoryUsage: 4.16 MB (4363264) - PerReadThreadRawHdfsThroughput: 1.31 MB/sec - RemoteScanRanges: 4 (4) - RowsRead: 12.54K (12538) - RowsReturned: 1 (1) - RowsReturnedRate: 1.00 /sec - ScanRangesComplete: 1 (1) - ScannerThreadsInvoluntaryContextSwitches: 6 (6) - ScannerThreadsTotalWallClockTime: 925.257ms - MaterializeTupleTime(*): 1.738ms - ScannerThreadsSysTime: 999.000us - ScannerThreadsUserTime: 999.000us - ScannerThreadsVoluntaryContextSwitches: 9 (9) - TotalRawHdfsReadTime(*): 963.534ms - TotalReadThroughput: 1.26 MB/sec Instance e54f7da15a77d3d0:342167b500000001 (host=cloud_machine_4:22000):(Total: 826.232ms, non-child: 0.000ns, % non-child: 0.00%) Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:2/2.32 MB MemoryUsage(500.000ms): 4.00 KB, 149.45 KB ThreadUsage(500.000ms): 1, 2 - AverageThreadTokens: 1.50 - BloomFilterBytes: 0 - PeakMemoryUsage: 4.30 MB (4512208) - PerHostPeakMemUsage: 4.30 MB (4512208) - PrepareTime: 35.627ms - RowsProduced: 2 (2) - TotalCpuTime: 990.391ms - TotalNetworkReceiveTime: 0.000ns - TotalNetworkSendTime: 119.366us - TotalStorageWaitTime: 986.716ms BlockMgr: - BlockWritesOutstanding: 0 (0) - BlocksCreated: 0 (0) - BlocksRecycled: 0 (0) - BufferedPins: 0 (0) - BytesWritten: 0 - MaxBlockSize: 8.00 MB (8388608) - MemoryLimit: 68.92 GB (74007330816) - PeakMemoryUsage: 0 - TotalBufferWaitTime: 0.000ns - TotalEncryptionTime: 0.000ns - TotalIntegrityCheckTime: 0.000ns - TotalReadBlockTime: 0.000ns CodeGen:(Total: 64.577ms, non-child: 64.577ms, % non-child: 100.00%) - CodegenTime: 931.308us - CompileTime: 6.338ms - LoadTime: 0.000ns - ModuleBitcodeSize: 1.86 MB (1953124) - NumFunctions: 21 (21) - NumInstructions: 333 (333) - OptimizationTime: 22.764ms - PrepareTime: 35.037ms DataStreamSender (dst_id=1):(Total: 97.596us, non-child: 97.596us, % non-child: 100.00%) - BytesSent: 360.00 B (360) - NetworkThroughput(*): 5.76 MB/sec - OverallThroughput: 3.52 MB/sec - RowsReturned: 2 (2) - SerializeBatchTime: 24.588us - TransmitDataRPCTime: 59.584us - UncompressedRowBatchSize: 444.00 B (444) HDFS_SCAN_NODE (id=0):(Total: 825.190ms, non-child: 825.190ms, % non-child: 100.00%) ExecOption: Expr Evaluation Codegen Disabled, PARQUET Codegen Enabled, Codegen enabled: 2 out of 2 Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:2/2.32 MB Hdfs Read Thread Concurrency Bucket: 0:0% 1:100% 2:0% 3:0% 4:0% 5:0% 6:0% 7:0% File Formats: PARQUET/NONE:6 BytesRead(500.000ms): 0, 1.26 MB - AverageHdfsReadThreadConcurrency: 1.00 - AverageScannerThreadConcurrency: 1.00 - BytesRead: 2.52 MB (2637269) - BytesReadDataNodeCache: 0 - BytesReadLocal: 0 - BytesReadRemoteUnexpected: 0 - BytesReadShortCircuit: 0 - DecompressionTime: 0.000ns - MaxCompressedTextFileLength: 0 - NumColumns: 3 (3) - NumDisksAccessed: 1 (1) - NumRowGroups: 2 (2) - NumScannerThreadsStarted: 2 (2) - PeakMemoryUsage: 4.29 MB (4502528) - PerReadThreadRawHdfsThroughput: 2.74 MB/sec - RemoteScanRanges: 8 (8) - RowsRead: 25.08K (25076) - RowsReturned: 2 (2) - RowsReturnedRate: 2.00 /sec - ScanRangesComplete: 2 (2) - ScannerThreadsInvoluntaryContextSwitches: 4 (4) - ScannerThreadsTotalWallClockTime: 990.396ms - MaterializeTupleTime(*): 2.738ms - ScannerThreadsSysTime: 999.000us - ScannerThreadsUserTime: 1.998ms - ScannerThreadsVoluntaryContextSwitches: 11 (11) - TotalRawHdfsReadTime(*): 917.567ms - TotalReadThroughput: 1.26 MB/sec Instance e54f7da15a77d3d0:342167b500000004 (host=cloud_machine_2:22000):(Total: 300.651ms, non-child: 0.000ns, % non-child: 0.00%) Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:2/2.32 MB - AverageThreadTokens: 0.00 - BloomFilterBytes: 0 - PeakMemoryUsage: 6.28 MB (6581712) - PerHostPeakMemUsage: 6.28 MB (6581712) - PrepareTime: 37.604ms - RowsProduced: 2 (2) - TotalCpuTime: 505.989ms - TotalNetworkReceiveTime: 0.000ns - TotalNetworkSendTime: 183.110us - TotalStorageWaitTime: 502.133ms BlockMgr: - BlockWritesOutstanding: 0 (0) - BlocksCreated: 0 (0) - BlocksRecycled: 0 (0) - BufferedPins: 0 (0) - BytesWritten: 0 - MaxBlockSize: 8.00 MB (8388608) - MemoryLimit: 68.92 GB (74007330816) - PeakMemoryUsage: 0 - TotalBufferWaitTime: 0.000ns - TotalEncryptionTime: 0.000ns - TotalIntegrityCheckTime: 0.000ns - TotalReadBlockTime: 0.000ns CodeGen:(Total: 68.324ms, non-child: 68.324ms, % non-child: 100.00%) - CodegenTime: 960.040us - CompileTime: 6.946ms - LoadTime: 0.000ns - ModuleBitcodeSize: 1.86 MB (1953124) - NumFunctions: 21 (21) - NumInstructions: 333 (333) - OptimizationTime: 23.985ms - PrepareTime: 36.961ms DataStreamSender (dst_id=1):(Total: 117.568us, non-child: 117.568us, % non-child: 100.00%) - BytesSent: 356.00 B (356) - NetworkThroughput(*): 5.02 MB/sec - OverallThroughput: 2.89 MB/sec - RowsReturned: 2 (2) - SerializeBatchTime: 24.648us - TransmitDataRPCTime: 67.574us - UncompressedRowBatchSize: 441.00 B (441) HDFS_SCAN_NODE (id=0):(Total: 299.442ms, non-child: 299.442ms, % non-child: 100.00%) ExecOption: Expr Evaluation Codegen Disabled, PARQUET Codegen Enabled, Codegen enabled: 2 out of 2 Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:2/2.32 MB Hdfs Read Thread Concurrency Bucket: 0:0% 1:0% 2:0% 3:0% 4:0% 5:0% 6:0% 7:0% File Formats: PARQUET/NONE:6 - AverageHdfsReadThreadConcurrency: 0.00 - AverageScannerThreadConcurrency: 0.00 - BytesRead: 2.52 MB (2637306) - BytesReadDataNodeCache: 0 - BytesReadLocal: 0 - BytesReadRemoteUnexpected: 0 - BytesReadShortCircuit: 0 - DecompressionTime: 0.000ns - MaxCompressedTextFileLength: 0 - NumColumns: 3 (3) - NumDisksAccessed: 1 (1) - NumRowGroups: 2 (2) - NumScannerThreadsStarted: 2 (2) - PeakMemoryUsage: 6.27 MB (6572032) - PerReadThreadRawHdfsThroughput: 7.41 MB/sec - RemoteScanRanges: 8 (8) - RowsRead: 25.08K (25076) - RowsReturned: 2 (2) - RowsReturnedRate: 6.00 /sec - ScanRangesComplete: 2 (2) - ScannerThreadsInvoluntaryContextSwitches: 0 (0) - ScannerThreadsTotalWallClockTime: 505.994ms - MaterializeTupleTime(*): 3.052ms - ScannerThreadsSysTime: 0.000ns - ScannerThreadsUserTime: 2.998ms - ScannerThreadsVoluntaryContextSwitches: 12 (12) - TotalRawHdfsReadTime(*): 339.372ms - TotalReadThroughput: 0.00 /sec Instance e54f7da15a77d3d0:342167b500000003 (host=cloud_machine_3:22000):(Total: 295.146ms, non-child: 0.000ns, % non-child: 0.00%) Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:1/1.16 MB MemoryUsage(500.000ms): 2.12 MB ThreadUsage(500.000ms): 2 - AverageThreadTokens: 2.00 - BloomFilterBytes: 0 - PeakMemoryUsage: 4.17 MB (4372944) - PerHostPeakMemUsage: 4.17 MB (4372944) - PrepareTime: 40.105ms - RowsProduced: 1 (1) - TotalCpuTime: 329.600ms - TotalNetworkReceiveTime: 0.000ns - TotalNetworkSendTime: 350.338us - TotalStorageWaitTime: 250.893ms BlockMgr: - BlockWritesOutstanding: 0 (0) - BlocksCreated: 0 (0) - BlocksRecycled: 0 (0) - BufferedPins: 0 (0) - BytesWritten: 0 - MaxBlockSize: 8.00 MB (8388608) - MemoryLimit: 68.92 GB (74007330816) - PeakMemoryUsage: 0 - TotalBufferWaitTime: 0.000ns - TotalEncryptionTime: 0.000ns - TotalIntegrityCheckTime: 0.000ns - TotalReadBlockTime: 0.000ns CodeGen:(Total: 71.042ms, non-child: 71.042ms, % non-child: 100.00%) - CodegenTime: 1.957ms - CompileTime: 7.470ms - LoadTime: 0.000ns - ModuleBitcodeSize: 1.86 MB (1953124) - NumFunctions: 21 (21) - NumInstructions: 333 (333) - OptimizationTime: 24.257ms - PrepareTime: 38.877ms DataStreamSender (dst_id=1):(Total: 683.450us, non-child: 683.450us, % non-child: 100.00%) - BytesSent: 180.00 B (180) - NetworkThroughput(*): 266.91 KB/sec - OverallThroughput: 257.20 KB/sec - RowsReturned: 1 (1) - SerializeBatchTime: 11.730us - TransmitDataRPCTime: 658.576us - UncompressedRowBatchSize: 222.00 B (222) HDFS_SCAN_NODE (id=0):(Total: 293.038ms, non-child: 293.038ms, % non-child: 100.00%) ExecOption: Expr Evaluation Codegen Disabled, PARQUET Codegen Enabled, Codegen enabled: 1 out of 1 Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:1/1.16 MB Hdfs Read Thread Concurrency Bucket: 0:0% 1:100% 2:0% 3:0% 4:0% 5:0% 6:0% 7:0% File Formats: PARQUET/NONE:3 BytesRead(500.000ms): 639.23 KB - AverageHdfsReadThreadConcurrency: 1.00 - AverageScannerThreadConcurrency: 1.00 - BytesRead: 1.26 MB (1318607) - BytesReadDataNodeCache: 0 - BytesReadLocal: 0 - BytesReadRemoteUnexpected: 0 - BytesReadShortCircuit: 0 - DecompressionTime: 0.000ns - MaxCompressedTextFileLength: 0 - NumColumns: 3 (3) - NumDisksAccessed: 1 (1) - NumRowGroups: 1 (1) - NumScannerThreadsStarted: 1 (1) - PeakMemoryUsage: 4.16 MB (4363264) - PerReadThreadRawHdfsThroughput: 6.79 MB/sec - RemoteScanRanges: 4 (4) - RowsRead: 12.54K (12538) - RowsReturned: 1 (1) - RowsReturnedRate: 3.00 /sec - ScanRangesComplete: 1 (1) - ScannerThreadsInvoluntaryContextSwitches: 0 (0) - ScannerThreadsTotalWallClockTime: 252.714ms - MaterializeTupleTime(*): 1.482ms - ScannerThreadsSysTime: 0.000ns - ScannerThreadsUserTime: 999.000us - ScannerThreadsVoluntaryContextSwitches: 7 (7) - TotalRawHdfsReadTime(*): 185.279ms - TotalReadThroughput: 1.25 MB/sec
... View more
03-07-2019
09:14 AM
Yeah we need to make some changes in Impala to optimise this case (large SELECT result sets) better. We have some of that work in Impala. If you're doing large extracts of data, it's often better to do a "CREATE TABLE AS SELECT" into a text table and download those files directly from the filesystem, if that's possible.
... View more