Member since
09-29-2016
60
Posts
0
Kudos Received
0
Solutions
09-18-2019
08:53 PM
Not able to open this link : http://ingest.tips/2015/01/31/parquet-row-group-size/ can you please check and repost it please ?
... View more
03-14-2019
11:01 AM
Hi Tim, I am using parquet format for the table. i also had tried "set PARQUET_FALLBACK_SCHEMA_RESOLUTION=name " before running the query but same result. please see the profile below : Query (id=e54f7da15a77d3d0:342167b500000000): Summary: Session ID: 734733b810bda5d6:61715265a5d564b8 Session Type: BEESWAX Start Time: 2019-03-14 13:37:44.723524000 End Time: 2019-03-14 13:37:45.723116000 Query Type: QUERY Query State: FINISHED Query Status: OK Impala Version: impalad version 2.7.0-cdh5.9.0 RELEASE (build 4b4cf1936bd6cdf34fda5e2f32827e7d60c07a9c) User: usr_impala Connected User: usr_impala Delegated User: Network Address: 153.40.73.237:47596 Default Db: default Sql Statement: select * from comm_status where serverdate='2019-03-02' and report_reference_number='CITIXLME549300DV4DLC540WV917GB00D3CMMY07020190301' Coordinator: cloud_machine_1:22000 Query Options (non default): Plan: ---------------- Estimated Per-Host Requirements: Memory=48.00MB VCores=1 01:EXCHANGE [UNPARTITIONED] | hosts=4 per-host-mem=unavailable | tuple-ids=0 row-size=177B cardinality=2 | 00:SCAN HDFS [default.comm_status, RANDOM] partitions=6/16 files=6 size=6.96MB predicates: report_reference_number = 'CITIXLME549300DV4DLC540WV917GB00D3CMMY07020190301' table stats: 166968 rows total column stats: all hosts=4 per-host-mem=48.00MB tuple-ids=0 row-size=177B cardinality=2 ---------------- Estimated Per-Host Mem: 50331648 Estimated Per-Host VCores: 1 Request Pool: default-pool Admission result: Admitted immediately ExecSummary: Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail ------------------------------------------------------------------------------------------------------------------- 01:EXCHANGE 1 64.697us 64.697us 6 2 0 -1.00 B UNPARTITIONED 00:SCAN HDFS 4 594.658ms 960.960ms 6 2 6.27 MB 48.00 MB default.comm_status Planner Timeline: 2.019ms - Analysis finished: 521.731us (521.731us) - Equivalence classes computed: 614.107us (92.376us) - Single node plan created: 1.340ms (726.439us) - Runtime filters computed: 1.357ms (17.135us) - Distributed plan created: 1.637ms (279.515us) - Planning finished: 2.019ms (382.286us) Query Timeline: 1s001ms - Start execution: 46.572us (46.572us) - Planning finished: 2.951ms (2.904ms) - Submit for admission: 3.103ms (152.472us) - Completed admission: 3.211ms (107.528us) - Ready to start 4 remote fragments: 3.562ms (351.656us) - All 4 remote fragments started: 8.496ms (4.933ms) - Rows available: 272.403ms (263.906ms) - First row fetched: 310.582ms (38.179ms) - Unregister query: 999.597ms (689.015ms) - ComputeScanRangeAssignmentTimer: 33.622us ImpalaServer: - ClientFetchWaitTimer: 40.960ms - RowMaterializationTimer: 16.160us Execution Profile e54f7da15a77d3d0:342167b500000000:(Total: 955.209ms, non-child: 0.000ns, % non-child: 0.00%) Number of filters: 0 Filter routing table: ID Src. Node Tgt. Node(s) Targets Target type Partition filter Pending (Expected) First arrived Completed Enabled ---------------------------------------------------------------------------------------------------------------------------- Fragment start latencies: Count: 4, 25th %-ile: 1ms, 50th %-ile: 1ms, 75th %-ile: 1ms, 90th %-ile: 4ms, 95th %-ile: 4ms, 99.9th %-ile: 4ms Per Node Peak Memory Usage: cloud_machine_2:22000(6.28 MB) cloud_machine_3:22000(4.17 MB) cloud_machine_4:22000(4.30 MB) cloud_machine_1:22000(4.20 MB) - FiltersReceived: 0 (0) - FinalizationTimer: 0.000ns Coordinator Fragment F01:(Total: 949.221ms, non-child: 241.396us, % non-child: 0.03%) MemoryUsage(500.000ms): 8.00 KB, 24.01 KB - AverageThreadTokens: 0.00 - BloomFilterBytes: 0 - PeakMemoryUsage: 32.02 KB (32784) - PerHostPeakMemUsage: 0 - PrepareTime: 26.430us - RowsProduced: 0 (0) - TotalCpuTime: 43.717ms - TotalNetworkReceiveTime: 948.955ms - TotalNetworkSendTime: 0.000ns - TotalStorageWaitTime: 0.000ns BlockMgr: - BlockWritesOutstanding: 0 (0) - BlocksCreated: 0 (0) - BlocksRecycled: 0 (0) - BufferedPins: 0 (0) - BytesWritten: 0 - MaxBlockSize: 8.00 MB (8388608) - MemoryLimit: 68.92 GB (74007330816) - PeakMemoryUsage: 0 - TotalBufferWaitTime: 0.000ns - TotalEncryptionTime: 0.000ns - TotalIntegrityCheckTime: 0.000ns - TotalReadBlockTime: 0.000ns EXCHANGE_NODE (id=1):(Total: 948.980ms, non-child: 64.697us, % non-child: 0.01%) BytesReceived(500.000ms): 0, 716.00 B - BytesReceived: 1.05 KB (1072) - ConvertRowBatchTime: 9.082us - DeserializeRowBatchTimer: 42.542us - FirstBatchArrivalWaitTime: 263.623ms - PeakMemoryUsage: 0 - RowsReturned: 6 (6) - RowsReturnedRate: 6.00 /sec - SendersBlockedTimer: 0.000ns - SendersBlockedTotalTimer(*): 0.000ns Averaged Fragment F00:(Total: 595.899ms, non-child: 0.000ns, % non-child: 0.00%) split sizes: min: 1.16 MB, max: 2.32 MB, avg: 1.74 MB, stddev: 594.15 KB completion times: min:328.847ms max:992.298ms mean: 627.859ms stddev:300.544ms execution rates: min:1.17 MB/sec max:6.96 MB/sec mean:3.59 MB/sec stddev:2.12 MB/sec num instances: 4 - AverageThreadTokens: 1.25 - BloomFilterBytes: 0 - PeakMemoryUsage: 4.73 MB (4959952) - PerHostPeakMemUsage: 4.74 MB (4967122) - PrepareTime: 37.204ms - RowsProduced: 1 (1) - TotalCpuTime: 704.988ms - TotalNetworkReceiveTime: 0.000ns - TotalNetworkSendTime: 170.458us - TotalStorageWaitTime: 665.723ms BlockMgr: - BlockWritesOutstanding: 0 (0) - BlocksCreated: 0 (0) - BlocksRecycled: 0 (0) - BufferedPins: 0 (0) - BytesWritten: 0 - MaxBlockSize: 8.00 MB (8388608) - MemoryLimit: 68.92 GB (74007330816) - PeakMemoryUsage: 0 - TotalBufferWaitTime: 0.000ns - TotalEncryptionTime: 0.000ns - TotalIntegrityCheckTime: 0.000ns - TotalReadBlockTime: 0.000ns CodeGen:(Total: 67.096ms, non-child: 67.096ms, % non-child: 100.00%) - CodegenTime: 1.170ms - CompileTime: 6.799ms - LoadTime: 0.000ns - ModuleBitcodeSize: 1.86 MB (1953124) - NumFunctions: 21 (21) - NumInstructions: 333 (333) - OptimizationTime: 23.416ms - PrepareTime: 36.447ms DataStreamSender (dst_id=1):(Total: 241.040us, non-child: 241.040us, % non-child: 100.00%) - BytesSent: 268.00 B (268) - NetworkThroughput(*): 3.94 MB/sec - OverallThroughput: 2.30 MB/sec - RowsReturned: 1 (1) - SerializeBatchTime: 18.028us - TransmitDataRPCTime: 205.339us - UncompressedRowBatchSize: 331.00 B (331) HDFS_SCAN_NODE (id=0):(Total: 594.658ms, non-child: 594.658ms, % non-child: 100.00%) - AverageHdfsReadThreadConcurrency: 0.75 - AverageScannerThreadConcurrency: 0.75 - BytesRead: 1.89 MB (1977967) - BytesReadDataNodeCache: 0 - BytesReadLocal: 0 - BytesReadRemoteUnexpected: 0 - BytesReadShortCircuit: 0 - DecompressionTime: 0.000ns - MaxCompressedTextFileLength: 0 - NumColumns: 3 (3) - NumDisksAccessed: 1 (1) - NumRowGroups: 1 (1) - NumScannerThreadsStarted: 1 (1) - PeakMemoryUsage: 4.72 MB (4950272) - PerReadThreadRawHdfsThroughput: 4.56 MB/sec - RemoteScanRanges: 6 (6) - RowsRead: 18.81K (18807) - RowsReturned: 1 (1) - RowsReturnedRate: 3.00 /sec - ScanRangesComplete: 1 (1) - ScannerThreadsInvoluntaryContextSwitches: 2 (2) - ScannerThreadsTotalWallClockTime: 668.590ms - MaterializeTupleTime(*): 2.253ms - ScannerThreadsSysTime: 499.500us - ScannerThreadsUserTime: 1.748ms - ScannerThreadsVoluntaryContextSwitches: 9 (9) - TotalRawHdfsReadTime(*): 601.438ms - TotalReadThroughput: 963.39 KB/sec Fragment F00: Instance e54f7da15a77d3d0:342167b500000002 (host=cloud_machine_1:22000):(Total: 961.567ms, non-child: 0.000ns, % non-child: 0.00%) Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:1/1.16 MB MemoryUsage(500.000ms): 4.00 KB, 2.12 MB ThreadUsage(500.000ms): 1, 2 - AverageThreadTokens: 1.50 - BloomFilterBytes: 0 - PeakMemoryUsage: 4.17 MB (4372944) - PerHostPeakMemUsage: 4.20 MB (4401624) - PrepareTime: 35.478ms - RowsProduced: 1 (1) - TotalCpuTime: 993.973ms - TotalNetworkReceiveTime: 0.000ns - TotalNetworkSendTime: 29.018us - TotalStorageWaitTime: 923.149ms CodeGen:(Total: 64.442ms, non-child: 64.442ms, % non-child: 100.00%) - CodegenTime: 832.912us - CompileTime: 6.441ms - LoadTime: 0.000ns - ModuleBitcodeSize: 1.86 MB (1953124) - NumFunctions: 21 (21) - NumInstructions: 333 (333) - OptimizationTime: 22.658ms - PrepareTime: 34.912ms DataStreamSender (dst_id=1):(Total: 65.548us, non-child: 65.548us, % non-child: 100.00%) - BytesSent: 176.00 B (176) - NetworkThroughput(*): 4.71 MB/sec - OverallThroughput: 2.56 MB/sec - RowsReturned: 1 (1) - SerializeBatchTime: 11.148us - TransmitDataRPCTime: 35.622us - UncompressedRowBatchSize: 219.00 B (219) HDFS_SCAN_NODE (id=0):(Total: 960.960ms, non-child: 960.960ms, % non-child: 100.00%) ExecOption: Expr Evaluation Codegen Disabled, PARQUET Codegen Enabled, Codegen enabled: 1 out of 1 Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:1/1.16 MB Hdfs Read Thread Concurrency Bucket: 0:0% 1:100% 2:0% 3:0% 4:0% 5:0% 6:0% 7:0% File Formats: PARQUET/NONE:3 BytesRead(500.000ms): 0, 1.26 MB - AverageHdfsReadThreadConcurrency: 1.00 - AverageScannerThreadConcurrency: 1.00 - BytesRead: 1.26 MB (1318689) - BytesReadDataNodeCache: 0 - BytesReadLocal: 0 - BytesReadRemoteUnexpected: 0 - BytesReadShortCircuit: 0 - DecompressionTime: 0.000ns - MaxCompressedTextFileLength: 0 - NumColumns: 3 (3) - NumDisksAccessed: 1 (1) - NumRowGroups: 1 (1) - NumScannerThreadsStarted: 1 (1) - PeakMemoryUsage: 4.16 MB (4363264) - PerReadThreadRawHdfsThroughput: 1.31 MB/sec - RemoteScanRanges: 4 (4) - RowsRead: 12.54K (12538) - RowsReturned: 1 (1) - RowsReturnedRate: 1.00 /sec - ScanRangesComplete: 1 (1) - ScannerThreadsInvoluntaryContextSwitches: 6 (6) - ScannerThreadsTotalWallClockTime: 925.257ms - MaterializeTupleTime(*): 1.738ms - ScannerThreadsSysTime: 999.000us - ScannerThreadsUserTime: 999.000us - ScannerThreadsVoluntaryContextSwitches: 9 (9) - TotalRawHdfsReadTime(*): 963.534ms - TotalReadThroughput: 1.26 MB/sec Instance e54f7da15a77d3d0:342167b500000001 (host=cloud_machine_4:22000):(Total: 826.232ms, non-child: 0.000ns, % non-child: 0.00%) Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:2/2.32 MB MemoryUsage(500.000ms): 4.00 KB, 149.45 KB ThreadUsage(500.000ms): 1, 2 - AverageThreadTokens: 1.50 - BloomFilterBytes: 0 - PeakMemoryUsage: 4.30 MB (4512208) - PerHostPeakMemUsage: 4.30 MB (4512208) - PrepareTime: 35.627ms - RowsProduced: 2 (2) - TotalCpuTime: 990.391ms - TotalNetworkReceiveTime: 0.000ns - TotalNetworkSendTime: 119.366us - TotalStorageWaitTime: 986.716ms BlockMgr: - BlockWritesOutstanding: 0 (0) - BlocksCreated: 0 (0) - BlocksRecycled: 0 (0) - BufferedPins: 0 (0) - BytesWritten: 0 - MaxBlockSize: 8.00 MB (8388608) - MemoryLimit: 68.92 GB (74007330816) - PeakMemoryUsage: 0 - TotalBufferWaitTime: 0.000ns - TotalEncryptionTime: 0.000ns - TotalIntegrityCheckTime: 0.000ns - TotalReadBlockTime: 0.000ns CodeGen:(Total: 64.577ms, non-child: 64.577ms, % non-child: 100.00%) - CodegenTime: 931.308us - CompileTime: 6.338ms - LoadTime: 0.000ns - ModuleBitcodeSize: 1.86 MB (1953124) - NumFunctions: 21 (21) - NumInstructions: 333 (333) - OptimizationTime: 22.764ms - PrepareTime: 35.037ms DataStreamSender (dst_id=1):(Total: 97.596us, non-child: 97.596us, % non-child: 100.00%) - BytesSent: 360.00 B (360) - NetworkThroughput(*): 5.76 MB/sec - OverallThroughput: 3.52 MB/sec - RowsReturned: 2 (2) - SerializeBatchTime: 24.588us - TransmitDataRPCTime: 59.584us - UncompressedRowBatchSize: 444.00 B (444) HDFS_SCAN_NODE (id=0):(Total: 825.190ms, non-child: 825.190ms, % non-child: 100.00%) ExecOption: Expr Evaluation Codegen Disabled, PARQUET Codegen Enabled, Codegen enabled: 2 out of 2 Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:2/2.32 MB Hdfs Read Thread Concurrency Bucket: 0:0% 1:100% 2:0% 3:0% 4:0% 5:0% 6:0% 7:0% File Formats: PARQUET/NONE:6 BytesRead(500.000ms): 0, 1.26 MB - AverageHdfsReadThreadConcurrency: 1.00 - AverageScannerThreadConcurrency: 1.00 - BytesRead: 2.52 MB (2637269) - BytesReadDataNodeCache: 0 - BytesReadLocal: 0 - BytesReadRemoteUnexpected: 0 - BytesReadShortCircuit: 0 - DecompressionTime: 0.000ns - MaxCompressedTextFileLength: 0 - NumColumns: 3 (3) - NumDisksAccessed: 1 (1) - NumRowGroups: 2 (2) - NumScannerThreadsStarted: 2 (2) - PeakMemoryUsage: 4.29 MB (4502528) - PerReadThreadRawHdfsThroughput: 2.74 MB/sec - RemoteScanRanges: 8 (8) - RowsRead: 25.08K (25076) - RowsReturned: 2 (2) - RowsReturnedRate: 2.00 /sec - ScanRangesComplete: 2 (2) - ScannerThreadsInvoluntaryContextSwitches: 4 (4) - ScannerThreadsTotalWallClockTime: 990.396ms - MaterializeTupleTime(*): 2.738ms - ScannerThreadsSysTime: 999.000us - ScannerThreadsUserTime: 1.998ms - ScannerThreadsVoluntaryContextSwitches: 11 (11) - TotalRawHdfsReadTime(*): 917.567ms - TotalReadThroughput: 1.26 MB/sec Instance e54f7da15a77d3d0:342167b500000004 (host=cloud_machine_2:22000):(Total: 300.651ms, non-child: 0.000ns, % non-child: 0.00%) Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:2/2.32 MB - AverageThreadTokens: 0.00 - BloomFilterBytes: 0 - PeakMemoryUsage: 6.28 MB (6581712) - PerHostPeakMemUsage: 6.28 MB (6581712) - PrepareTime: 37.604ms - RowsProduced: 2 (2) - TotalCpuTime: 505.989ms - TotalNetworkReceiveTime: 0.000ns - TotalNetworkSendTime: 183.110us - TotalStorageWaitTime: 502.133ms BlockMgr: - BlockWritesOutstanding: 0 (0) - BlocksCreated: 0 (0) - BlocksRecycled: 0 (0) - BufferedPins: 0 (0) - BytesWritten: 0 - MaxBlockSize: 8.00 MB (8388608) - MemoryLimit: 68.92 GB (74007330816) - PeakMemoryUsage: 0 - TotalBufferWaitTime: 0.000ns - TotalEncryptionTime: 0.000ns - TotalIntegrityCheckTime: 0.000ns - TotalReadBlockTime: 0.000ns CodeGen:(Total: 68.324ms, non-child: 68.324ms, % non-child: 100.00%) - CodegenTime: 960.040us - CompileTime: 6.946ms - LoadTime: 0.000ns - ModuleBitcodeSize: 1.86 MB (1953124) - NumFunctions: 21 (21) - NumInstructions: 333 (333) - OptimizationTime: 23.985ms - PrepareTime: 36.961ms DataStreamSender (dst_id=1):(Total: 117.568us, non-child: 117.568us, % non-child: 100.00%) - BytesSent: 356.00 B (356) - NetworkThroughput(*): 5.02 MB/sec - OverallThroughput: 2.89 MB/sec - RowsReturned: 2 (2) - SerializeBatchTime: 24.648us - TransmitDataRPCTime: 67.574us - UncompressedRowBatchSize: 441.00 B (441) HDFS_SCAN_NODE (id=0):(Total: 299.442ms, non-child: 299.442ms, % non-child: 100.00%) ExecOption: Expr Evaluation Codegen Disabled, PARQUET Codegen Enabled, Codegen enabled: 2 out of 2 Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:2/2.32 MB Hdfs Read Thread Concurrency Bucket: 0:0% 1:0% 2:0% 3:0% 4:0% 5:0% 6:0% 7:0% File Formats: PARQUET/NONE:6 - AverageHdfsReadThreadConcurrency: 0.00 - AverageScannerThreadConcurrency: 0.00 - BytesRead: 2.52 MB (2637306) - BytesReadDataNodeCache: 0 - BytesReadLocal: 0 - BytesReadRemoteUnexpected: 0 - BytesReadShortCircuit: 0 - DecompressionTime: 0.000ns - MaxCompressedTextFileLength: 0 - NumColumns: 3 (3) - NumDisksAccessed: 1 (1) - NumRowGroups: 2 (2) - NumScannerThreadsStarted: 2 (2) - PeakMemoryUsage: 6.27 MB (6572032) - PerReadThreadRawHdfsThroughput: 7.41 MB/sec - RemoteScanRanges: 8 (8) - RowsRead: 25.08K (25076) - RowsReturned: 2 (2) - RowsReturnedRate: 6.00 /sec - ScanRangesComplete: 2 (2) - ScannerThreadsInvoluntaryContextSwitches: 0 (0) - ScannerThreadsTotalWallClockTime: 505.994ms - MaterializeTupleTime(*): 3.052ms - ScannerThreadsSysTime: 0.000ns - ScannerThreadsUserTime: 2.998ms - ScannerThreadsVoluntaryContextSwitches: 12 (12) - TotalRawHdfsReadTime(*): 339.372ms - TotalReadThroughput: 0.00 /sec Instance e54f7da15a77d3d0:342167b500000003 (host=cloud_machine_3:22000):(Total: 295.146ms, non-child: 0.000ns, % non-child: 0.00%) Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:1/1.16 MB MemoryUsage(500.000ms): 2.12 MB ThreadUsage(500.000ms): 2 - AverageThreadTokens: 2.00 - BloomFilterBytes: 0 - PeakMemoryUsage: 4.17 MB (4372944) - PerHostPeakMemUsage: 4.17 MB (4372944) - PrepareTime: 40.105ms - RowsProduced: 1 (1) - TotalCpuTime: 329.600ms - TotalNetworkReceiveTime: 0.000ns - TotalNetworkSendTime: 350.338us - TotalStorageWaitTime: 250.893ms BlockMgr: - BlockWritesOutstanding: 0 (0) - BlocksCreated: 0 (0) - BlocksRecycled: 0 (0) - BufferedPins: 0 (0) - BytesWritten: 0 - MaxBlockSize: 8.00 MB (8388608) - MemoryLimit: 68.92 GB (74007330816) - PeakMemoryUsage: 0 - TotalBufferWaitTime: 0.000ns - TotalEncryptionTime: 0.000ns - TotalIntegrityCheckTime: 0.000ns - TotalReadBlockTime: 0.000ns CodeGen:(Total: 71.042ms, non-child: 71.042ms, % non-child: 100.00%) - CodegenTime: 1.957ms - CompileTime: 7.470ms - LoadTime: 0.000ns - ModuleBitcodeSize: 1.86 MB (1953124) - NumFunctions: 21 (21) - NumInstructions: 333 (333) - OptimizationTime: 24.257ms - PrepareTime: 38.877ms DataStreamSender (dst_id=1):(Total: 683.450us, non-child: 683.450us, % non-child: 100.00%) - BytesSent: 180.00 B (180) - NetworkThroughput(*): 266.91 KB/sec - OverallThroughput: 257.20 KB/sec - RowsReturned: 1 (1) - SerializeBatchTime: 11.730us - TransmitDataRPCTime: 658.576us - UncompressedRowBatchSize: 222.00 B (222) HDFS_SCAN_NODE (id=0):(Total: 293.038ms, non-child: 293.038ms, % non-child: 100.00%) ExecOption: Expr Evaluation Codegen Disabled, PARQUET Codegen Enabled, Codegen enabled: 1 out of 1 Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:1/1.16 MB Hdfs Read Thread Concurrency Bucket: 0:0% 1:100% 2:0% 3:0% 4:0% 5:0% 6:0% 7:0% File Formats: PARQUET/NONE:3 BytesRead(500.000ms): 639.23 KB - AverageHdfsReadThreadConcurrency: 1.00 - AverageScannerThreadConcurrency: 1.00 - BytesRead: 1.26 MB (1318607) - BytesReadDataNodeCache: 0 - BytesReadLocal: 0 - BytesReadRemoteUnexpected: 0 - BytesReadShortCircuit: 0 - DecompressionTime: 0.000ns - MaxCompressedTextFileLength: 0 - NumColumns: 3 (3) - NumDisksAccessed: 1 (1) - NumRowGroups: 1 (1) - NumScannerThreadsStarted: 1 (1) - PeakMemoryUsage: 4.16 MB (4363264) - PerReadThreadRawHdfsThroughput: 6.79 MB/sec - RemoteScanRanges: 4 (4) - RowsRead: 12.54K (12538) - RowsReturned: 1 (1) - RowsReturnedRate: 3.00 /sec - ScanRangesComplete: 1 (1) - ScannerThreadsInvoluntaryContextSwitches: 0 (0) - ScannerThreadsTotalWallClockTime: 252.714ms - MaterializeTupleTime(*): 1.482ms - ScannerThreadsSysTime: 0.000ns - ScannerThreadsUserTime: 999.000us - ScannerThreadsVoluntaryContextSwitches: 7 (7) - TotalRawHdfsReadTime(*): 185.279ms - TotalReadThroughput: 1.25 MB/sec
... View more
03-13-2019
06:35 AM
Hello experts, I am going through very weired problem . i am inserting data into hive table through spark-sql. after that i do invalidate on impala for that table,when i query in hive/spark i get the data what i am expecting. But when i run the query in impala, i can see few records missing . please see below : hive> select * from comm_status where serverdate='2019-03-02' and report_reference_number in('CMMY07020190301'); // i am leaving selected result blank here for security concern. Time taken: 3.762 seconds, Fetched: 8 row(s) Impala-shell > select * from comm_status where serverdate='2019-03-02' and report_reference_number in('CMMY07020190301'); // i am leaving selected result blank here for security concern. Fetched 6 row(s) in 0.42s following things i tried : 1. invalidate metadata comm_status on impala 2. refresh comm_status on impala 3. msck repair table comm_status on hive 4.ALTER TABLE comm_status RECOVER PARTITIONS; on impala 5. restarted hive and impala cluster both and again repeated 1-4 steps. 6. i checked hive-site.xml in hive/conf , impala/conf and spark/conf to have same metastore url and it was same 🙂 Now i cant get any idea to resolve this problem. I am using impala 2.7, spark 2.3, hive 2.1
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Impala
-
Apache Spark
02-19-2019
06:39 AM
Hi, UAT has same number of tables and columns as in PROD but UAT has more data compare to PROD. and the problem is while query parsing and optimizer.once parsing is done there is no problem in execution, so did not talked about driver or executers. i executed same query in both UAT and PROD and started monitoring the logs... in UAT i saw parsing was done within a minut of time and in PROD it took 20 mins. i was also watvhing UAT and PROD hive log. UAT log was not moving but PROD hive.log was moving. PROD was fetching table metdata from hive metastore but UAT was not.
... View more
02-18-2019
10:57 AM
I am running hive query on PROD env on spark using hive context/Sparksession like this : sparksession.sql("query") it takes approx 10-15 mins to parse the query and then query runs which looks abnormal to me becuase same query i run on UAT env and it takes less than 1 mins in parsing. this relaly looks very abnormal. when i see hive.log of UAT while i execute that spark sql , cant see logs are moving. but when i see hive.log of prod , i see logs are moving and see this kind of logs : gettable tablename get partitions initialize called using direct sql underlying db oracle.... this kind of stememnts i see in log for all the tables involved in query. now this is is really strange.... if we can see its trying to load metadata from hive metastore which will used in query parsing then why its not happening in UAT... Please help, its very serious issue.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
10-30-2018
12:50 AM
not all the times but most of the times, table size is not even 100 gb. there are 1200 partitions in the table. my question is when the table is loaded in catalog which i can see by going to that tab then what does it mean by loading ? and why its doing ?
... View more
10-29-2018
12:43 PM
any updates on this issue please this is really very irritating. when i see catalog tab i see all the tables loaded there. but when the same table used in query it goes like : as src where src.r = 1
I1029 15:35:41.023432 21809 Frontend.java:822] Requesting prioritized load of table(s): default.comm_simeod_raw i just do refresh whenever any partition is added in the table.even if i run the query on old data same problem comes.
... View more
10-25-2018
06:46 AM
On startup I do invalidate metadata. After that whenever any file is loaded I run invalidate metadata for the table. I don't run refresh table because it takes longer time than invalidate metadata table.
... View more
10-22-2018
09:48 AM
After starting impala,I do invalidate metadata on all the impalad nodes so should I still get this problem ?
... View more
10-22-2018
07:10 AM
Hello, whenever i am running either select or insert query on Impala, it takes huge timing. after seeing the log i can see most of the times going in loading the table. i see message like this : Frontend.java:808] Requesting prioritized load of table(s): comm_tran_data,comm_tran_info I cant understand when every impalad have catalog with all the metadata then why its trying to load. I am using Impala version 2.7,2.9 and 2.1. in all the version same problem. can you suggest the solution for it.?? this problem is coming in PROD env.
... View more
Labels:
- Labels:
-
Apache Impala
10-11-2018
01:25 AM
Cloudera employee , any update on this please ?
... View more
10-03-2018
09:26 AM
I think here secondary namenode does not have capabilities to push newly created fsimage to primary namenode. checkpoint node has that capability. in case of secondary namnode its primary namenode's responsibilites to pull that updated fsimage while start up.
... View more
08-07-2018
10:58 PM
new updates, I created separate cluster where there is only impala and hive then its working fine,compute stats is getting comleted in 10 mins. but when hadoop,impala and hive are together on same cluster its taking more than 1.5 hr to complete the compuet stats. Now this is really surprising when hadoop,imapala and hive are on same cluster it should be actually faster as data will be local to hive/impala. people any thought on this ?
... View more
06-08-2018
04:30 AM
when doing compute incremental stats on a table having 4000 partitions in total and 30 gb of data takes 5-6 hrs to complete.
have seen hive.log which is executing lots of queries to update metastore .
i have cloudera 5.7 with oracle as metastore.
have seen same thing gets completed in 2-3 min in MySQL metastore.
but cant say oracle is creating problem becuase have seen with DBA that everything is smooth at Oracle.
looks some problem at impala/hive.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Impala
06-08-2018
04:24 AM
Hello, I have done all the analysys and found below things : 1. involved DBA to check if there is any problem at Oracle where my metastore is, but can see there is no problem all of the queries have finished normally,did not see any thread contention or wait time also. 2. then i tried to put logger level to all in Hive log4j properties and i was able to see all the actual queries being run to hive meta store. and here i can see there is no problem becuase all the queries are being executed in ms of times. so it looks that there is no problem at HMS Oracle. also i tried doing compute incremental stats from Impala and same problem i can see.
... View more
06-06-2018
01:08 AM
No,I installed through tar balls manually.
... View more
06-01-2018
03:25 AM
Hi,
I am facing a weired problem across all the env. today i tried to add one column to hive table with cascade command to reflect in all existing partitions. it took 6 hours to update 4500 partitions.
even if i do compute incremenal stats from impala for the first time it took 6 hours to do that.
the hive metastore is in oracle.
looks like the query is not getting executed at the speed and somewhere its going in wait at database oracle server.
or it may also possible that hive metastore is not sending all the queries to oracle .
what could be the reason, i am not sure whats going on, any help will be appreciated.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Impala
05-17-2018
11:12 PM
you are correct, i tested. i thought it shows min.insynch.replicas also,thanks for help.
... View more
05-17-2018
02:31 AM
Hello,
Recently i have made UAT cluster of KAFKA with three brokers. the problem is min.insync.replicas is not taking any effect. following things are happeninig :
1. in server.properties there is no mention of min.insync.replicas, whenever i am creating any topic and whatever replication number i am providing ,in synch replica is just copying that. so if i create topic test with replica 3 in synch replica is also 3. even if i am creating replication with 2 then in synch is also 2.
Partition Detail
Partition First Offset Last Offset Size Leader Replicas In Sync Replicas Preferred Leader? Under Replicated?
0
0
0
0
2
2,3,1
1,2,3
Yes
No
1
0
0
0
3
3,1,2
1,2,3
Yes
No
2
0
0
0
1
1,2,3
1,2,3
Yes
No
2. i tried to put min.insync.replicas=2 in server.properties of all the brokers and tried creating the topic with replication 3, here i was expecting in synch replica to be 2 but no its again 3.
3. then i tried to create topic by providing config with this :
kafka-topics.sh --create --zookeeper machine1:2181,machine2:2181,machine3:2181 --replication-factor 3 --partitions 3 --config min.insync.replicas=1 --topic syncdemo
here i thought this time it will show in synch replica as 1 but no ,it stills 3.
4. lastly i tried to alter it to provide 2 instead of 1 to check whether alter make it work by below :
kafka-topics.sh --alter --zookeeper machine1:2181,machine2:2181,machine3:2181 --topic syncdemo --config min.insync.replicas=1 ,
but no effect.
its really very frustrating that none of the ways are making it work. please provide the solution,thanks in advance.
i am using : CDH 5.7 and apache Kafka : kafka_2.11-1.0.0 , zookeeper-3.4.5-cdh5.7.0 on redhat 6.5.
... View more
Labels:
- Labels:
-
Apache Kafka
-
Apache Zookeeper
05-17-2018
12:39 AM
No... not even got the reply from cloudera employee.
... View more
05-17-2018
12:37 AM
the reported issue is in PROD though we have got the error in UAT too but very rare. we are having 15 JDBC connection in connection pool.
... View more
05-16-2018
11:43 PM
just a correction, we are using haproxy in UAT env but for PROD we are using VIP which is created by infrastructure team. here is the UAT config: bash-4.1$ vi haproxy-cdh.cfg user impala group impala daemon # turn on stats unix socket #stats socket /var/lib/haproxy/stats #--------------------------------------------------------------------- # common defaults that all the 'listen' and 'backend' sections will # use if not designated in their block # # You might need to adjust timing values to prevent timeouts. #--------------------------------------------------------------------- defaults # mode http # option httplog option dontlognull option http-server-close option redispatch retries 3 maxconn 1000 timeout connect 300000 timeout client 300000 timeout server 300000 # # This sets up the admin page for HA Proxy at port 25002. # listen stats :25002 balance mode http stats enable stats auth username:password # This is the setup for Impala. Impala client connect to load_balancer_host:25003. # HAProxy will balance connections among the list of servers listed below. # The list of Impalad is listening at port 21000 for beeswax (impala-shell) or original ODBC driver. # For JDBC or ODBC version 2.x driver, use port 21050 instead of 21000. #listen impala :25053 listen impala :25003 timeout client 3600000 timeout server 3600000 balance leastconn
... View more
05-16-2018
11:29 PM
yes i am using haproxy where i have configured my all impalad nodes and this url is used by jdbc connection.
... View more
05-16-2018
11:18 PM
yes,but still get this problem many times.
... View more
12-26-2017
04:04 AM
I get below error sometimes while running Impala query through JDBC connection using Hive jars: java.sql.SQLException: Error while cleaning up the server resources at org.apache.hive.jdbc.HiveConnection.close(HiveConnection.java:580) at gxs.core.hadoop$with_impala_connection_STAR_.invoke(Unknown Source) Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe at org.apache.thrift.transport.TIOStreamTransport.flush(TIOStreamTransport.java:161) at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:65) at org.apache.hive.service.cli.thrift.TCLIService$Client.send_CloseSession(TCLIService.java:173) at org.apache.hive.service.cli.thrift.TCLIService$Client.CloseSession(TCLIService.java:165) at org.apache.hive.jdbc.HiveConnection.close(HiveConnection.java:578) ... 25 more Caused by: java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) at org.apache.thrift.transport.TIOStreamTransport.flush(TIOStreamTransport.java:159) ... 29 more How to resolve this ?
... View more
Labels:
- Labels:
-
Apache Impala
10-31-2017
01:08 AM
Hello, I have migrated impala to impalad version 2.9.0-cdh5.12.1 RELEASE from mpalad version 2.7.0-cdh5.7.1. when trying to run any kind of query i am getting below error in impala log : E1031 03:35:14.060696 20860 impala-server.cc:1299] Error deserializing item TABLE:default.comm__trade_table: couldn't deserialize thrift msg: TProtocolException: Invalid data E1031 03:41:04.148355 20860 impala-server.cc:1299] Error deserializing item TABLE:default.comm_ldeal_table: couldn't deserialize thrift msg: TProtocolException: Invalid data My table format is csv format and i tried doing invalidate metadata and refresh table.kindly suggest what to do ?
... View more
Labels:
- Labels:
-
Apache Impala
10-11-2017
12:01 AM
Hi, I am getting error while starting impala-shell or impala server. the error is class not found : org.apache.hadoop.mapred.MRVersion setting up Imapal 2.1.1-cdh5.3.1 version on hive-1.1.0 and cdh5.7.1.
... View more
Labels:
- Labels:
-
Apache Impala
10-09-2017
12:32 AM
I can provide you query profiles, kindly share your email id. query profile is very big which has 20-30 pages,its not easy to share here.
... View more
10-06-2017
03:34 AM
I use impala version 2.7.0 whhich i am not finding good and stable. the people from other team are using 2.3.0 version . same query runs on 2.3.0 but on 2.7.0 it goes memory out or takes 30 mins instead of 2 mins. I am loosing the confidence gradually . I would like to know what is the stable and more reliable PRODUCTION Impala version.
... View more
Labels:
- Labels:
-
Apache Impala