Member since
09-29-2016
60
Posts
0
Kudos Received
0
Solutions
09-18-2019
08:53 PM
Not able to open this link : http://ingest.tips/2015/01/31/parquet-row-group-size/ can you please check and repost it please ?
... View more
03-14-2019
11:01 AM
Hi Tim, I am using parquet format for the table. i also had tried "set PARQUET_FALLBACK_SCHEMA_RESOLUTION=name " before running the query but same result. please see the profile below : Query (id=e54f7da15a77d3d0:342167b500000000): Summary: Session ID: 734733b810bda5d6:61715265a5d564b8 Session Type: BEESWAX Start Time: 2019-03-14 13:37:44.723524000 End Time: 2019-03-14 13:37:45.723116000 Query Type: QUERY Query State: FINISHED Query Status: OK Impala Version: impalad version 2.7.0-cdh5.9.0 RELEASE (build 4b4cf1936bd6cdf34fda5e2f32827e7d60c07a9c) User: usr_impala Connected User: usr_impala Delegated User: Network Address: 153.40.73.237:47596 Default Db: default Sql Statement: select * from comm_status where serverdate='2019-03-02' and report_reference_number='CITIXLME549300DV4DLC540WV917GB00D3CMMY07020190301' Coordinator: cloud_machine_1:22000 Query Options (non default): Plan: ---------------- Estimated Per-Host Requirements: Memory=48.00MB VCores=1 01:EXCHANGE [UNPARTITIONED] | hosts=4 per-host-mem=unavailable | tuple-ids=0 row-size=177B cardinality=2 | 00:SCAN HDFS [default.comm_status, RANDOM] partitions=6/16 files=6 size=6.96MB predicates: report_reference_number = 'CITIXLME549300DV4DLC540WV917GB00D3CMMY07020190301' table stats: 166968 rows total column stats: all hosts=4 per-host-mem=48.00MB tuple-ids=0 row-size=177B cardinality=2 ---------------- Estimated Per-Host Mem: 50331648 Estimated Per-Host VCores: 1 Request Pool: default-pool Admission result: Admitted immediately ExecSummary: Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail ------------------------------------------------------------------------------------------------------------------- 01:EXCHANGE 1 64.697us 64.697us 6 2 0 -1.00 B UNPARTITIONED 00:SCAN HDFS 4 594.658ms 960.960ms 6 2 6.27 MB 48.00 MB default.comm_status Planner Timeline: 2.019ms - Analysis finished: 521.731us (521.731us) - Equivalence classes computed: 614.107us (92.376us) - Single node plan created: 1.340ms (726.439us) - Runtime filters computed: 1.357ms (17.135us) - Distributed plan created: 1.637ms (279.515us) - Planning finished: 2.019ms (382.286us) Query Timeline: 1s001ms - Start execution: 46.572us (46.572us) - Planning finished: 2.951ms (2.904ms) - Submit for admission: 3.103ms (152.472us) - Completed admission: 3.211ms (107.528us) - Ready to start 4 remote fragments: 3.562ms (351.656us) - All 4 remote fragments started: 8.496ms (4.933ms) - Rows available: 272.403ms (263.906ms) - First row fetched: 310.582ms (38.179ms) - Unregister query: 999.597ms (689.015ms) - ComputeScanRangeAssignmentTimer: 33.622us ImpalaServer: - ClientFetchWaitTimer: 40.960ms - RowMaterializationTimer: 16.160us Execution Profile e54f7da15a77d3d0:342167b500000000:(Total: 955.209ms, non-child: 0.000ns, % non-child: 0.00%) Number of filters: 0 Filter routing table: ID Src. Node Tgt. Node(s) Targets Target type Partition filter Pending (Expected) First arrived Completed Enabled ---------------------------------------------------------------------------------------------------------------------------- Fragment start latencies: Count: 4, 25th %-ile: 1ms, 50th %-ile: 1ms, 75th %-ile: 1ms, 90th %-ile: 4ms, 95th %-ile: 4ms, 99.9th %-ile: 4ms Per Node Peak Memory Usage: cloud_machine_2:22000(6.28 MB) cloud_machine_3:22000(4.17 MB) cloud_machine_4:22000(4.30 MB) cloud_machine_1:22000(4.20 MB) - FiltersReceived: 0 (0) - FinalizationTimer: 0.000ns Coordinator Fragment F01:(Total: 949.221ms, non-child: 241.396us, % non-child: 0.03%) MemoryUsage(500.000ms): 8.00 KB, 24.01 KB - AverageThreadTokens: 0.00 - BloomFilterBytes: 0 - PeakMemoryUsage: 32.02 KB (32784) - PerHostPeakMemUsage: 0 - PrepareTime: 26.430us - RowsProduced: 0 (0) - TotalCpuTime: 43.717ms - TotalNetworkReceiveTime: 948.955ms - TotalNetworkSendTime: 0.000ns - TotalStorageWaitTime: 0.000ns BlockMgr: - BlockWritesOutstanding: 0 (0) - BlocksCreated: 0 (0) - BlocksRecycled: 0 (0) - BufferedPins: 0 (0) - BytesWritten: 0 - MaxBlockSize: 8.00 MB (8388608) - MemoryLimit: 68.92 GB (74007330816) - PeakMemoryUsage: 0 - TotalBufferWaitTime: 0.000ns - TotalEncryptionTime: 0.000ns - TotalIntegrityCheckTime: 0.000ns - TotalReadBlockTime: 0.000ns EXCHANGE_NODE (id=1):(Total: 948.980ms, non-child: 64.697us, % non-child: 0.01%) BytesReceived(500.000ms): 0, 716.00 B - BytesReceived: 1.05 KB (1072) - ConvertRowBatchTime: 9.082us - DeserializeRowBatchTimer: 42.542us - FirstBatchArrivalWaitTime: 263.623ms - PeakMemoryUsage: 0 - RowsReturned: 6 (6) - RowsReturnedRate: 6.00 /sec - SendersBlockedTimer: 0.000ns - SendersBlockedTotalTimer(*): 0.000ns Averaged Fragment F00:(Total: 595.899ms, non-child: 0.000ns, % non-child: 0.00%) split sizes: min: 1.16 MB, max: 2.32 MB, avg: 1.74 MB, stddev: 594.15 KB completion times: min:328.847ms max:992.298ms mean: 627.859ms stddev:300.544ms execution rates: min:1.17 MB/sec max:6.96 MB/sec mean:3.59 MB/sec stddev:2.12 MB/sec num instances: 4 - AverageThreadTokens: 1.25 - BloomFilterBytes: 0 - PeakMemoryUsage: 4.73 MB (4959952) - PerHostPeakMemUsage: 4.74 MB (4967122) - PrepareTime: 37.204ms - RowsProduced: 1 (1) - TotalCpuTime: 704.988ms - TotalNetworkReceiveTime: 0.000ns - TotalNetworkSendTime: 170.458us - TotalStorageWaitTime: 665.723ms BlockMgr: - BlockWritesOutstanding: 0 (0) - BlocksCreated: 0 (0) - BlocksRecycled: 0 (0) - BufferedPins: 0 (0) - BytesWritten: 0 - MaxBlockSize: 8.00 MB (8388608) - MemoryLimit: 68.92 GB (74007330816) - PeakMemoryUsage: 0 - TotalBufferWaitTime: 0.000ns - TotalEncryptionTime: 0.000ns - TotalIntegrityCheckTime: 0.000ns - TotalReadBlockTime: 0.000ns CodeGen:(Total: 67.096ms, non-child: 67.096ms, % non-child: 100.00%) - CodegenTime: 1.170ms - CompileTime: 6.799ms - LoadTime: 0.000ns - ModuleBitcodeSize: 1.86 MB (1953124) - NumFunctions: 21 (21) - NumInstructions: 333 (333) - OptimizationTime: 23.416ms - PrepareTime: 36.447ms DataStreamSender (dst_id=1):(Total: 241.040us, non-child: 241.040us, % non-child: 100.00%) - BytesSent: 268.00 B (268) - NetworkThroughput(*): 3.94 MB/sec - OverallThroughput: 2.30 MB/sec - RowsReturned: 1 (1) - SerializeBatchTime: 18.028us - TransmitDataRPCTime: 205.339us - UncompressedRowBatchSize: 331.00 B (331) HDFS_SCAN_NODE (id=0):(Total: 594.658ms, non-child: 594.658ms, % non-child: 100.00%) - AverageHdfsReadThreadConcurrency: 0.75 - AverageScannerThreadConcurrency: 0.75 - BytesRead: 1.89 MB (1977967) - BytesReadDataNodeCache: 0 - BytesReadLocal: 0 - BytesReadRemoteUnexpected: 0 - BytesReadShortCircuit: 0 - DecompressionTime: 0.000ns - MaxCompressedTextFileLength: 0 - NumColumns: 3 (3) - NumDisksAccessed: 1 (1) - NumRowGroups: 1 (1) - NumScannerThreadsStarted: 1 (1) - PeakMemoryUsage: 4.72 MB (4950272) - PerReadThreadRawHdfsThroughput: 4.56 MB/sec - RemoteScanRanges: 6 (6) - RowsRead: 18.81K (18807) - RowsReturned: 1 (1) - RowsReturnedRate: 3.00 /sec - ScanRangesComplete: 1 (1) - ScannerThreadsInvoluntaryContextSwitches: 2 (2) - ScannerThreadsTotalWallClockTime: 668.590ms - MaterializeTupleTime(*): 2.253ms - ScannerThreadsSysTime: 499.500us - ScannerThreadsUserTime: 1.748ms - ScannerThreadsVoluntaryContextSwitches: 9 (9) - TotalRawHdfsReadTime(*): 601.438ms - TotalReadThroughput: 963.39 KB/sec Fragment F00: Instance e54f7da15a77d3d0:342167b500000002 (host=cloud_machine_1:22000):(Total: 961.567ms, non-child: 0.000ns, % non-child: 0.00%) Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:1/1.16 MB MemoryUsage(500.000ms): 4.00 KB, 2.12 MB ThreadUsage(500.000ms): 1, 2 - AverageThreadTokens: 1.50 - BloomFilterBytes: 0 - PeakMemoryUsage: 4.17 MB (4372944) - PerHostPeakMemUsage: 4.20 MB (4401624) - PrepareTime: 35.478ms - RowsProduced: 1 (1) - TotalCpuTime: 993.973ms - TotalNetworkReceiveTime: 0.000ns - TotalNetworkSendTime: 29.018us - TotalStorageWaitTime: 923.149ms CodeGen:(Total: 64.442ms, non-child: 64.442ms, % non-child: 100.00%) - CodegenTime: 832.912us - CompileTime: 6.441ms - LoadTime: 0.000ns - ModuleBitcodeSize: 1.86 MB (1953124) - NumFunctions: 21 (21) - NumInstructions: 333 (333) - OptimizationTime: 22.658ms - PrepareTime: 34.912ms DataStreamSender (dst_id=1):(Total: 65.548us, non-child: 65.548us, % non-child: 100.00%) - BytesSent: 176.00 B (176) - NetworkThroughput(*): 4.71 MB/sec - OverallThroughput: 2.56 MB/sec - RowsReturned: 1 (1) - SerializeBatchTime: 11.148us - TransmitDataRPCTime: 35.622us - UncompressedRowBatchSize: 219.00 B (219) HDFS_SCAN_NODE (id=0):(Total: 960.960ms, non-child: 960.960ms, % non-child: 100.00%) ExecOption: Expr Evaluation Codegen Disabled, PARQUET Codegen Enabled, Codegen enabled: 1 out of 1 Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:1/1.16 MB Hdfs Read Thread Concurrency Bucket: 0:0% 1:100% 2:0% 3:0% 4:0% 5:0% 6:0% 7:0% File Formats: PARQUET/NONE:3 BytesRead(500.000ms): 0, 1.26 MB - AverageHdfsReadThreadConcurrency: 1.00 - AverageScannerThreadConcurrency: 1.00 - BytesRead: 1.26 MB (1318689) - BytesReadDataNodeCache: 0 - BytesReadLocal: 0 - BytesReadRemoteUnexpected: 0 - BytesReadShortCircuit: 0 - DecompressionTime: 0.000ns - MaxCompressedTextFileLength: 0 - NumColumns: 3 (3) - NumDisksAccessed: 1 (1) - NumRowGroups: 1 (1) - NumScannerThreadsStarted: 1 (1) - PeakMemoryUsage: 4.16 MB (4363264) - PerReadThreadRawHdfsThroughput: 1.31 MB/sec - RemoteScanRanges: 4 (4) - RowsRead: 12.54K (12538) - RowsReturned: 1 (1) - RowsReturnedRate: 1.00 /sec - ScanRangesComplete: 1 (1) - ScannerThreadsInvoluntaryContextSwitches: 6 (6) - ScannerThreadsTotalWallClockTime: 925.257ms - MaterializeTupleTime(*): 1.738ms - ScannerThreadsSysTime: 999.000us - ScannerThreadsUserTime: 999.000us - ScannerThreadsVoluntaryContextSwitches: 9 (9) - TotalRawHdfsReadTime(*): 963.534ms - TotalReadThroughput: 1.26 MB/sec Instance e54f7da15a77d3d0:342167b500000001 (host=cloud_machine_4:22000):(Total: 826.232ms, non-child: 0.000ns, % non-child: 0.00%) Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:2/2.32 MB MemoryUsage(500.000ms): 4.00 KB, 149.45 KB ThreadUsage(500.000ms): 1, 2 - AverageThreadTokens: 1.50 - BloomFilterBytes: 0 - PeakMemoryUsage: 4.30 MB (4512208) - PerHostPeakMemUsage: 4.30 MB (4512208) - PrepareTime: 35.627ms - RowsProduced: 2 (2) - TotalCpuTime: 990.391ms - TotalNetworkReceiveTime: 0.000ns - TotalNetworkSendTime: 119.366us - TotalStorageWaitTime: 986.716ms BlockMgr: - BlockWritesOutstanding: 0 (0) - BlocksCreated: 0 (0) - BlocksRecycled: 0 (0) - BufferedPins: 0 (0) - BytesWritten: 0 - MaxBlockSize: 8.00 MB (8388608) - MemoryLimit: 68.92 GB (74007330816) - PeakMemoryUsage: 0 - TotalBufferWaitTime: 0.000ns - TotalEncryptionTime: 0.000ns - TotalIntegrityCheckTime: 0.000ns - TotalReadBlockTime: 0.000ns CodeGen:(Total: 64.577ms, non-child: 64.577ms, % non-child: 100.00%) - CodegenTime: 931.308us - CompileTime: 6.338ms - LoadTime: 0.000ns - ModuleBitcodeSize: 1.86 MB (1953124) - NumFunctions: 21 (21) - NumInstructions: 333 (333) - OptimizationTime: 22.764ms - PrepareTime: 35.037ms DataStreamSender (dst_id=1):(Total: 97.596us, non-child: 97.596us, % non-child: 100.00%) - BytesSent: 360.00 B (360) - NetworkThroughput(*): 5.76 MB/sec - OverallThroughput: 3.52 MB/sec - RowsReturned: 2 (2) - SerializeBatchTime: 24.588us - TransmitDataRPCTime: 59.584us - UncompressedRowBatchSize: 444.00 B (444) HDFS_SCAN_NODE (id=0):(Total: 825.190ms, non-child: 825.190ms, % non-child: 100.00%) ExecOption: Expr Evaluation Codegen Disabled, PARQUET Codegen Enabled, Codegen enabled: 2 out of 2 Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:2/2.32 MB Hdfs Read Thread Concurrency Bucket: 0:0% 1:100% 2:0% 3:0% 4:0% 5:0% 6:0% 7:0% File Formats: PARQUET/NONE:6 BytesRead(500.000ms): 0, 1.26 MB - AverageHdfsReadThreadConcurrency: 1.00 - AverageScannerThreadConcurrency: 1.00 - BytesRead: 2.52 MB (2637269) - BytesReadDataNodeCache: 0 - BytesReadLocal: 0 - BytesReadRemoteUnexpected: 0 - BytesReadShortCircuit: 0 - DecompressionTime: 0.000ns - MaxCompressedTextFileLength: 0 - NumColumns: 3 (3) - NumDisksAccessed: 1 (1) - NumRowGroups: 2 (2) - NumScannerThreadsStarted: 2 (2) - PeakMemoryUsage: 4.29 MB (4502528) - PerReadThreadRawHdfsThroughput: 2.74 MB/sec - RemoteScanRanges: 8 (8) - RowsRead: 25.08K (25076) - RowsReturned: 2 (2) - RowsReturnedRate: 2.00 /sec - ScanRangesComplete: 2 (2) - ScannerThreadsInvoluntaryContextSwitches: 4 (4) - ScannerThreadsTotalWallClockTime: 990.396ms - MaterializeTupleTime(*): 2.738ms - ScannerThreadsSysTime: 999.000us - ScannerThreadsUserTime: 1.998ms - ScannerThreadsVoluntaryContextSwitches: 11 (11) - TotalRawHdfsReadTime(*): 917.567ms - TotalReadThroughput: 1.26 MB/sec Instance e54f7da15a77d3d0:342167b500000004 (host=cloud_machine_2:22000):(Total: 300.651ms, non-child: 0.000ns, % non-child: 0.00%) Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:2/2.32 MB - AverageThreadTokens: 0.00 - BloomFilterBytes: 0 - PeakMemoryUsage: 6.28 MB (6581712) - PerHostPeakMemUsage: 6.28 MB (6581712) - PrepareTime: 37.604ms - RowsProduced: 2 (2) - TotalCpuTime: 505.989ms - TotalNetworkReceiveTime: 0.000ns - TotalNetworkSendTime: 183.110us - TotalStorageWaitTime: 502.133ms BlockMgr: - BlockWritesOutstanding: 0 (0) - BlocksCreated: 0 (0) - BlocksRecycled: 0 (0) - BufferedPins: 0 (0) - BytesWritten: 0 - MaxBlockSize: 8.00 MB (8388608) - MemoryLimit: 68.92 GB (74007330816) - PeakMemoryUsage: 0 - TotalBufferWaitTime: 0.000ns - TotalEncryptionTime: 0.000ns - TotalIntegrityCheckTime: 0.000ns - TotalReadBlockTime: 0.000ns CodeGen:(Total: 68.324ms, non-child: 68.324ms, % non-child: 100.00%) - CodegenTime: 960.040us - CompileTime: 6.946ms - LoadTime: 0.000ns - ModuleBitcodeSize: 1.86 MB (1953124) - NumFunctions: 21 (21) - NumInstructions: 333 (333) - OptimizationTime: 23.985ms - PrepareTime: 36.961ms DataStreamSender (dst_id=1):(Total: 117.568us, non-child: 117.568us, % non-child: 100.00%) - BytesSent: 356.00 B (356) - NetworkThroughput(*): 5.02 MB/sec - OverallThroughput: 2.89 MB/sec - RowsReturned: 2 (2) - SerializeBatchTime: 24.648us - TransmitDataRPCTime: 67.574us - UncompressedRowBatchSize: 441.00 B (441) HDFS_SCAN_NODE (id=0):(Total: 299.442ms, non-child: 299.442ms, % non-child: 100.00%) ExecOption: Expr Evaluation Codegen Disabled, PARQUET Codegen Enabled, Codegen enabled: 2 out of 2 Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:2/2.32 MB Hdfs Read Thread Concurrency Bucket: 0:0% 1:0% 2:0% 3:0% 4:0% 5:0% 6:0% 7:0% File Formats: PARQUET/NONE:6 - AverageHdfsReadThreadConcurrency: 0.00 - AverageScannerThreadConcurrency: 0.00 - BytesRead: 2.52 MB (2637306) - BytesReadDataNodeCache: 0 - BytesReadLocal: 0 - BytesReadRemoteUnexpected: 0 - BytesReadShortCircuit: 0 - DecompressionTime: 0.000ns - MaxCompressedTextFileLength: 0 - NumColumns: 3 (3) - NumDisksAccessed: 1 (1) - NumRowGroups: 2 (2) - NumScannerThreadsStarted: 2 (2) - PeakMemoryUsage: 6.27 MB (6572032) - PerReadThreadRawHdfsThroughput: 7.41 MB/sec - RemoteScanRanges: 8 (8) - RowsRead: 25.08K (25076) - RowsReturned: 2 (2) - RowsReturnedRate: 6.00 /sec - ScanRangesComplete: 2 (2) - ScannerThreadsInvoluntaryContextSwitches: 0 (0) - ScannerThreadsTotalWallClockTime: 505.994ms - MaterializeTupleTime(*): 3.052ms - ScannerThreadsSysTime: 0.000ns - ScannerThreadsUserTime: 2.998ms - ScannerThreadsVoluntaryContextSwitches: 12 (12) - TotalRawHdfsReadTime(*): 339.372ms - TotalReadThroughput: 0.00 /sec Instance e54f7da15a77d3d0:342167b500000003 (host=cloud_machine_3:22000):(Total: 295.146ms, non-child: 0.000ns, % non-child: 0.00%) Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:1/1.16 MB MemoryUsage(500.000ms): 2.12 MB ThreadUsage(500.000ms): 2 - AverageThreadTokens: 2.00 - BloomFilterBytes: 0 - PeakMemoryUsage: 4.17 MB (4372944) - PerHostPeakMemUsage: 4.17 MB (4372944) - PrepareTime: 40.105ms - RowsProduced: 1 (1) - TotalCpuTime: 329.600ms - TotalNetworkReceiveTime: 0.000ns - TotalNetworkSendTime: 350.338us - TotalStorageWaitTime: 250.893ms BlockMgr: - BlockWritesOutstanding: 0 (0) - BlocksCreated: 0 (0) - BlocksRecycled: 0 (0) - BufferedPins: 0 (0) - BytesWritten: 0 - MaxBlockSize: 8.00 MB (8388608) - MemoryLimit: 68.92 GB (74007330816) - PeakMemoryUsage: 0 - TotalBufferWaitTime: 0.000ns - TotalEncryptionTime: 0.000ns - TotalIntegrityCheckTime: 0.000ns - TotalReadBlockTime: 0.000ns CodeGen:(Total: 71.042ms, non-child: 71.042ms, % non-child: 100.00%) - CodegenTime: 1.957ms - CompileTime: 7.470ms - LoadTime: 0.000ns - ModuleBitcodeSize: 1.86 MB (1953124) - NumFunctions: 21 (21) - NumInstructions: 333 (333) - OptimizationTime: 24.257ms - PrepareTime: 38.877ms DataStreamSender (dst_id=1):(Total: 683.450us, non-child: 683.450us, % non-child: 100.00%) - BytesSent: 180.00 B (180) - NetworkThroughput(*): 266.91 KB/sec - OverallThroughput: 257.20 KB/sec - RowsReturned: 1 (1) - SerializeBatchTime: 11.730us - TransmitDataRPCTime: 658.576us - UncompressedRowBatchSize: 222.00 B (222) HDFS_SCAN_NODE (id=0):(Total: 293.038ms, non-child: 293.038ms, % non-child: 100.00%) ExecOption: Expr Evaluation Codegen Disabled, PARQUET Codegen Enabled, Codegen enabled: 1 out of 1 Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:1/1.16 MB Hdfs Read Thread Concurrency Bucket: 0:0% 1:100% 2:0% 3:0% 4:0% 5:0% 6:0% 7:0% File Formats: PARQUET/NONE:3 BytesRead(500.000ms): 639.23 KB - AverageHdfsReadThreadConcurrency: 1.00 - AverageScannerThreadConcurrency: 1.00 - BytesRead: 1.26 MB (1318607) - BytesReadDataNodeCache: 0 - BytesReadLocal: 0 - BytesReadRemoteUnexpected: 0 - BytesReadShortCircuit: 0 - DecompressionTime: 0.000ns - MaxCompressedTextFileLength: 0 - NumColumns: 3 (3) - NumDisksAccessed: 1 (1) - NumRowGroups: 1 (1) - NumScannerThreadsStarted: 1 (1) - PeakMemoryUsage: 4.16 MB (4363264) - PerReadThreadRawHdfsThroughput: 6.79 MB/sec - RemoteScanRanges: 4 (4) - RowsRead: 12.54K (12538) - RowsReturned: 1 (1) - RowsReturnedRate: 3.00 /sec - ScanRangesComplete: 1 (1) - ScannerThreadsInvoluntaryContextSwitches: 0 (0) - ScannerThreadsTotalWallClockTime: 252.714ms - MaterializeTupleTime(*): 1.482ms - ScannerThreadsSysTime: 0.000ns - ScannerThreadsUserTime: 999.000us - ScannerThreadsVoluntaryContextSwitches: 7 (7) - TotalRawHdfsReadTime(*): 185.279ms - TotalReadThroughput: 1.25 MB/sec
... View more
03-13-2019
06:35 AM
Hello experts, I am going through very weired problem . i am inserting data into hive table through spark-sql. after that i do invalidate on impala for that table,when i query in hive/spark i get the data what i am expecting. But when i run the query in impala, i can see few records missing . please see below : hive> select * from comm_status where serverdate='2019-03-02' and report_reference_number in('CMMY07020190301'); // i am leaving selected result blank here for security concern. Time taken: 3.762 seconds, Fetched: 8 row(s) Impala-shell > select * from comm_status where serverdate='2019-03-02' and report_reference_number in('CMMY07020190301'); // i am leaving selected result blank here for security concern. Fetched 6 row(s) in 0.42s following things i tried : 1. invalidate metadata comm_status on impala 2. refresh comm_status on impala 3. msck repair table comm_status on hive 4.ALTER TABLE comm_status RECOVER PARTITIONS; on impala 5. restarted hive and impala cluster both and again repeated 1-4 steps. 6. i checked hive-site.xml in hive/conf , impala/conf and spark/conf to have same metastore url and it was same 🙂 Now i cant get any idea to resolve this problem. I am using impala 2.7, spark 2.3, hive 2.1
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Impala
-
Apache Spark
10-11-2018
01:25 AM
Cloudera employee , any update on this please ?
... View more
10-03-2018
09:26 AM
I think here secondary namenode does not have capabilities to push newly created fsimage to primary namenode. checkpoint node has that capability. in case of secondary namnode its primary namenode's responsibilites to pull that updated fsimage while start up.
... View more
06-06-2018
01:08 AM
No,I installed through tar balls manually.
... View more
05-17-2018
11:12 PM
you are correct, i tested. i thought it shows min.insynch.replicas also,thanks for help.
... View more
05-17-2018
02:31 AM
Hello,
Recently i have made UAT cluster of KAFKA with three brokers. the problem is min.insync.replicas is not taking any effect. following things are happeninig :
1. in server.properties there is no mention of min.insync.replicas, whenever i am creating any topic and whatever replication number i am providing ,in synch replica is just copying that. so if i create topic test with replica 3 in synch replica is also 3. even if i am creating replication with 2 then in synch is also 2.
Partition Detail
Partition First Offset Last Offset Size Leader Replicas In Sync Replicas Preferred Leader? Under Replicated?
0
0
0
0
2
2,3,1
1,2,3
Yes
No
1
0
0
0
3
3,1,2
1,2,3
Yes
No
2
0
0
0
1
1,2,3
1,2,3
Yes
No
2. i tried to put min.insync.replicas=2 in server.properties of all the brokers and tried creating the topic with replication 3, here i was expecting in synch replica to be 2 but no its again 3.
3. then i tried to create topic by providing config with this :
kafka-topics.sh --create --zookeeper machine1:2181,machine2:2181,machine3:2181 --replication-factor 3 --partitions 3 --config min.insync.replicas=1 --topic syncdemo
here i thought this time it will show in synch replica as 1 but no ,it stills 3.
4. lastly i tried to alter it to provide 2 instead of 1 to check whether alter make it work by below :
kafka-topics.sh --alter --zookeeper machine1:2181,machine2:2181,machine3:2181 --topic syncdemo --config min.insync.replicas=1 ,
but no effect.
its really very frustrating that none of the ways are making it work. please provide the solution,thanks in advance.
i am using : CDH 5.7 and apache Kafka : kafka_2.11-1.0.0 , zookeeper-3.4.5-cdh5.7.0 on redhat 6.5.
... View more
Labels:
- Labels:
-
Apache Kafka
-
Apache Zookeeper
05-17-2018
12:39 AM
No... not even got the reply from cloudera employee.
... View more