About hrishi1dypim

hrishi1dypim · ‎09-18-2019

Not able to open this link : http://ingest.tips/2015/01/31/parquet-row-group-size/ can you please check and repost it please ?

hrishi1dypim · ‎03-14-2019

Hi Tim, I am using parquet format for the table. i also had tried "set PARQUET_FALLBACK_SCHEMA_RESOLUTION=name " before running the query but same result. please see the profile below : Query (id=e54f7da15a77d3d0:342167b500000000): Summary: Session ID: 734733b810bda5d6:61715265a5d564b8 Session Type: BEESWAX Start Time: 2019-03-14 13:37:44.723524000 End Time: 2019-03-14 13:37:45.723116000 Query Type: QUERY Query State: FINISHED Query Status: OK Impala Version: impalad version 2.7.0-cdh5.9.0 RELEASE (build 4b4cf1936bd6cdf34fda5e2f32827e7d60c07a9c) User: usr_impala Connected User: usr_impala Delegated User: Network Address: 153.40.73.237:47596 Default Db: default Sql Statement: select * from comm_status where serverdate='2019-03-02' and report_reference_number='CITIXLME549300DV4DLC540WV917GB00D3CMMY07020190301' Coordinator: cloud_machine_1:22000 Query Options (non default): Plan: ---------------- Estimated Per-Host Requirements: Memory=48.00MB VCores=1 01:EXCHANGE [UNPARTITIONED] | hosts=4 per-host-mem=unavailable | tuple-ids=0 row-size=177B cardinality=2 | 00:SCAN HDFS [default.comm_status, RANDOM] partitions=6/16 files=6 size=6.96MB predicates: report_reference_number = 'CITIXLME549300DV4DLC540WV917GB00D3CMMY07020190301' table stats: 166968 rows total column stats: all hosts=4 per-host-mem=48.00MB tuple-ids=0 row-size=177B cardinality=2 ---------------- Estimated Per-Host Mem: 50331648 Estimated Per-Host VCores: 1 Request Pool: default-pool Admission result: Admitted immediately ExecSummary: Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail ------------------------------------------------------------------------------------------------------------------- 01:EXCHANGE 1 64.697us 64.697us 6 2 0 -1.00 B UNPARTITIONED 00:SCAN HDFS 4 594.658ms 960.960ms 6 2 6.27 MB 48.00 MB default.comm_status Planner Timeline: 2.019ms - Analysis finished: 521.731us (521.731us) - Equivalence classes computed: 614.107us (92.376us) - Single node plan created: 1.340ms (726.439us) - Runtime filters computed: 1.357ms (17.135us) - Distributed plan created: 1.637ms (279.515us) - Planning finished: 2.019ms (382.286us) Query Timeline: 1s001ms - Start execution: 46.572us (46.572us) - Planning finished: 2.951ms (2.904ms) - Submit for admission: 3.103ms (152.472us) - Completed admission: 3.211ms (107.528us) - Ready to start 4 remote fragments: 3.562ms (351.656us) - All 4 remote fragments started: 8.496ms (4.933ms) - Rows available: 272.403ms (263.906ms) - First row fetched: 310.582ms (38.179ms) - Unregister query: 999.597ms (689.015ms) - ComputeScanRangeAssignmentTimer: 33.622us ImpalaServer: - ClientFetchWaitTimer: 40.960ms - RowMaterializationTimer: 16.160us Execution Profile e54f7da15a77d3d0:342167b500000000:(Total: 955.209ms, non-child: 0.000ns, % non-child: 0.00%) Number of filters: 0 Filter routing table: ID Src. Node Tgt. Node(s) Targets Target type Partition filter Pending (Expected) First arrived Completed Enabled ---------------------------------------------------------------------------------------------------------------------------- Fragment start latencies: Count: 4, 25th %-ile: 1ms, 50th %-ile: 1ms, 75th %-ile: 1ms, 90th %-ile: 4ms, 95th %-ile: 4ms, 99.9th %-ile: 4ms Per Node Peak Memory Usage: cloud_machine_2:22000(6.28 MB) cloud_machine_3:22000(4.17 MB) cloud_machine_4:22000(4.30 MB) cloud_machine_1:22000(4.20 MB) - FiltersReceived: 0 (0) - FinalizationTimer: 0.000ns Coordinator Fragment F01:(Total: 949.221ms, non-child: 241.396us, % non-child: 0.03%) MemoryUsage(500.000ms): 8.00 KB, 24.01 KB - AverageThreadTokens: 0.00 - BloomFilterBytes: 0 - PeakMemoryUsage: 32.02 KB (32784) - PerHostPeakMemUsage: 0 - PrepareTime: 26.430us - RowsProduced: 0 (0) - TotalCpuTime: 43.717ms - TotalNetworkReceiveTime: 948.955ms - TotalNetworkSendTime: 0.000ns - TotalStorageWaitTime: 0.000ns BlockMgr: - BlockWritesOutstanding: 0 (0) - BlocksCreated: 0 (0) - BlocksRecycled: 0 (0) - BufferedPins: 0 (0) - BytesWritten: 0 - MaxBlockSize: 8.00 MB (8388608) - MemoryLimit: 68.92 GB (74007330816) - PeakMemoryUsage: 0 - TotalBufferWaitTime: 0.000ns - TotalEncryptionTime: 0.000ns - TotalIntegrityCheckTime: 0.000ns - TotalReadBlockTime: 0.000ns EXCHANGE_NODE (id=1):(Total: 948.980ms, non-child: 64.697us, % non-child: 0.01%) BytesReceived(500.000ms): 0, 716.00 B - BytesReceived: 1.05 KB (1072) - ConvertRowBatchTime: 9.082us - DeserializeRowBatchTimer: 42.542us - FirstBatchArrivalWaitTime: 263.623ms - PeakMemoryUsage: 0 - RowsReturned: 6 (6) - RowsReturnedRate: 6.00 /sec - SendersBlockedTimer: 0.000ns - SendersBlockedTotalTimer(*): 0.000ns Averaged Fragment F00:(Total: 595.899ms, non-child: 0.000ns, % non-child: 0.00%) split sizes: min: 1.16 MB, max: 2.32 MB, avg: 1.74 MB, stddev: 594.15 KB completion times: min:328.847ms max:992.298ms mean: 627.859ms stddev:300.544ms execution rates: min:1.17 MB/sec max:6.96 MB/sec mean:3.59 MB/sec stddev:2.12 MB/sec num instances: 4 - AverageThreadTokens: 1.25 - BloomFilterBytes: 0 - PeakMemoryUsage: 4.73 MB (4959952) - PerHostPeakMemUsage: 4.74 MB (4967122) - PrepareTime: 37.204ms - RowsProduced: 1 (1) - TotalCpuTime: 704.988ms - TotalNetworkReceiveTime: 0.000ns - TotalNetworkSendTime: 170.458us - TotalStorageWaitTime: 665.723ms BlockMgr: - BlockWritesOutstanding: 0 (0) - BlocksCreated: 0 (0) - BlocksRecycled: 0 (0) - BufferedPins: 0 (0) - BytesWritten: 0 - MaxBlockSize: 8.00 MB (8388608) - MemoryLimit: 68.92 GB (74007330816) - PeakMemoryUsage: 0 - TotalBufferWaitTime: 0.000ns - TotalEncryptionTime: 0.000ns - TotalIntegrityCheckTime: 0.000ns - TotalReadBlockTime: 0.000ns CodeGen:(Total: 67.096ms, non-child: 67.096ms, % non-child: 100.00%) - CodegenTime: 1.170ms - CompileTime: 6.799ms - LoadTime: 0.000ns - ModuleBitcodeSize: 1.86 MB (1953124) - NumFunctions: 21 (21) - NumInstructions: 333 (333) - OptimizationTime: 23.416ms - PrepareTime: 36.447ms DataStreamSender (dst_id=1):(Total: 241.040us, non-child: 241.040us, % non-child: 100.00%) - BytesSent: 268.00 B (268) - NetworkThroughput(*): 3.94 MB/sec - OverallThroughput: 2.30 MB/sec - RowsReturned: 1 (1) - SerializeBatchTime: 18.028us - TransmitDataRPCTime: 205.339us - UncompressedRowBatchSize: 331.00 B (331) HDFS_SCAN_NODE (id=0):(Total: 594.658ms, non-child: 594.658ms, % non-child: 100.00%) - AverageHdfsReadThreadConcurrency: 0.75 - AverageScannerThreadConcurrency: 0.75 - BytesRead: 1.89 MB (1977967) - BytesReadDataNodeCache: 0 - BytesReadLocal: 0 - BytesReadRemoteUnexpected: 0 - BytesReadShortCircuit: 0 - DecompressionTime: 0.000ns - MaxCompressedTextFileLength: 0 - NumColumns: 3 (3) - NumDisksAccessed: 1 (1) - NumRowGroups: 1 (1) - NumScannerThreadsStarted: 1 (1) - PeakMemoryUsage: 4.72 MB (4950272) - PerReadThreadRawHdfsThroughput: 4.56 MB/sec - RemoteScanRanges: 6 (6) - RowsRead: 18.81K (18807) - RowsReturned: 1 (1) - RowsReturnedRate: 3.00 /sec - ScanRangesComplete: 1 (1) - ScannerThreadsInvoluntaryContextSwitches: 2 (2) - ScannerThreadsTotalWallClockTime: 668.590ms - MaterializeTupleTime(*): 2.253ms - ScannerThreadsSysTime: 499.500us - ScannerThreadsUserTime: 1.748ms - ScannerThreadsVoluntaryContextSwitches: 9 (9) - TotalRawHdfsReadTime(*): 601.438ms - TotalReadThroughput: 963.39 KB/sec Fragment F00: Instance e54f7da15a77d3d0:342167b500000002 (host=cloud_machine_1:22000):(Total: 961.567ms, non-child: 0.000ns, % non-child: 0.00%) Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:1/1.16 MB MemoryUsage(500.000ms): 4.00 KB, 2.12 MB ThreadUsage(500.000ms): 1, 2 - AverageThreadTokens: 1.50 - BloomFilterBytes: 0 - PeakMemoryUsage: 4.17 MB (4372944) - PerHostPeakMemUsage: 4.20 MB (4401624) - PrepareTime: 35.478ms - RowsProduced: 1 (1) - TotalCpuTime: 993.973ms - TotalNetworkReceiveTime: 0.000ns - TotalNetworkSendTime: 29.018us - TotalStorageWaitTime: 923.149ms CodeGen:(Total: 64.442ms, non-child: 64.442ms, % non-child: 100.00%) - CodegenTime: 832.912us - CompileTime: 6.441ms - LoadTime: 0.000ns - ModuleBitcodeSize: 1.86 MB (1953124) - NumFunctions: 21 (21) - NumInstructions: 333 (333) - OptimizationTime: 22.658ms - PrepareTime: 34.912ms DataStreamSender (dst_id=1):(Total: 65.548us, non-child: 65.548us, % non-child: 100.00%) - BytesSent: 176.00 B (176) - NetworkThroughput(*): 4.71 MB/sec - OverallThroughput: 2.56 MB/sec - RowsReturned: 1 (1) - SerializeBatchTime: 11.148us - TransmitDataRPCTime: 35.622us - UncompressedRowBatchSize: 219.00 B (219) HDFS_SCAN_NODE (id=0):(Total: 960.960ms, non-child: 960.960ms, % non-child: 100.00%) ExecOption: Expr Evaluation Codegen Disabled, PARQUET Codegen Enabled, Codegen enabled: 1 out of 1 Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:1/1.16 MB Hdfs Read Thread Concurrency Bucket: 0:0% 1:100% 2:0% 3:0% 4:0% 5:0% 6:0% 7:0% File Formats: PARQUET/NONE:3 BytesRead(500.000ms): 0, 1.26 MB - AverageHdfsReadThreadConcurrency: 1.00 - AverageScannerThreadConcurrency: 1.00 - BytesRead: 1.26 MB (1318689) - BytesReadDataNodeCache: 0 - BytesReadLocal: 0 - BytesReadRemoteUnexpected: 0 - BytesReadShortCircuit: 0 - DecompressionTime: 0.000ns - MaxCompressedTextFileLength: 0 - NumColumns: 3 (3) - NumDisksAccessed: 1 (1) - NumRowGroups: 1 (1) - NumScannerThreadsStarted: 1 (1) - PeakMemoryUsage: 4.16 MB (4363264) - PerReadThreadRawHdfsThroughput: 1.31 MB/sec - RemoteScanRanges: 4 (4) - RowsRead: 12.54K (12538) - RowsReturned: 1 (1) - RowsReturnedRate: 1.00 /sec - ScanRangesComplete: 1 (1) - ScannerThreadsInvoluntaryContextSwitches: 6 (6) - ScannerThreadsTotalWallClockTime: 925.257ms - MaterializeTupleTime(*): 1.738ms - ScannerThreadsSysTime: 999.000us - ScannerThreadsUserTime: 999.000us - ScannerThreadsVoluntaryContextSwitches: 9 (9) - TotalRawHdfsReadTime(*): 963.534ms - TotalReadThroughput: 1.26 MB/sec Instance e54f7da15a77d3d0:342167b500000001 (host=cloud_machine_4:22000):(Total: 826.232ms, non-child: 0.000ns, % non-child: 0.00%) Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:2/2.32 MB MemoryUsage(500.000ms): 4.00 KB, 149.45 KB ThreadUsage(500.000ms): 1, 2 - AverageThreadTokens: 1.50 - BloomFilterBytes: 0 - PeakMemoryUsage: 4.30 MB (4512208) - PerHostPeakMemUsage: 4.30 MB (4512208) - PrepareTime: 35.627ms - RowsProduced: 2 (2) - TotalCpuTime: 990.391ms - TotalNetworkReceiveTime: 0.000ns - TotalNetworkSendTime: 119.366us - TotalStorageWaitTime: 986.716ms BlockMgr: - BlockWritesOutstanding: 0 (0) - BlocksCreated: 0 (0) - BlocksRecycled: 0 (0) - BufferedPins: 0 (0) - BytesWritten: 0 - MaxBlockSize: 8.00 MB (8388608) - MemoryLimit: 68.92 GB (74007330816) - PeakMemoryUsage: 0 - TotalBufferWaitTime: 0.000ns - TotalEncryptionTime: 0.000ns - TotalIntegrityCheckTime: 0.000ns - TotalReadBlockTime: 0.000ns CodeGen:(Total: 64.577ms, non-child: 64.577ms, % non-child: 100.00%) - CodegenTime: 931.308us - CompileTime: 6.338ms - LoadTime: 0.000ns - ModuleBitcodeSize: 1.86 MB (1953124) - NumFunctions: 21 (21) - NumInstructions: 333 (333) - OptimizationTime: 22.764ms - PrepareTime: 35.037ms DataStreamSender (dst_id=1):(Total: 97.596us, non-child: 97.596us, % non-child: 100.00%) - BytesSent: 360.00 B (360) - NetworkThroughput(*): 5.76 MB/sec - OverallThroughput: 3.52 MB/sec - RowsReturned: 2 (2) - SerializeBatchTime: 24.588us - TransmitDataRPCTime: 59.584us - UncompressedRowBatchSize: 444.00 B (444) HDFS_SCAN_NODE (id=0):(Total: 825.190ms, non-child: 825.190ms, % non-child: 100.00%) ExecOption: Expr Evaluation Codegen Disabled, PARQUET Codegen Enabled, Codegen enabled: 2 out of 2 Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:2/2.32 MB Hdfs Read Thread Concurrency Bucket: 0:0% 1:100% 2:0% 3:0% 4:0% 5:0% 6:0% 7:0% File Formats: PARQUET/NONE:6 BytesRead(500.000ms): 0, 1.26 MB - AverageHdfsReadThreadConcurrency: 1.00 - AverageScannerThreadConcurrency: 1.00 - BytesRead: 2.52 MB (2637269) - BytesReadDataNodeCache: 0 - BytesReadLocal: 0 - BytesReadRemoteUnexpected: 0 - BytesReadShortCircuit: 0 - DecompressionTime: 0.000ns - MaxCompressedTextFileLength: 0 - NumColumns: 3 (3) - NumDisksAccessed: 1 (1) - NumRowGroups: 2 (2) - NumScannerThreadsStarted: 2 (2) - PeakMemoryUsage: 4.29 MB (4502528) - PerReadThreadRawHdfsThroughput: 2.74 MB/sec - RemoteScanRanges: 8 (8) - RowsRead: 25.08K (25076) - RowsReturned: 2 (2) - RowsReturnedRate: 2.00 /sec - ScanRangesComplete: 2 (2) - ScannerThreadsInvoluntaryContextSwitches: 4 (4) - ScannerThreadsTotalWallClockTime: 990.396ms - MaterializeTupleTime(*): 2.738ms - ScannerThreadsSysTime: 999.000us - ScannerThreadsUserTime: 1.998ms - ScannerThreadsVoluntaryContextSwitches: 11 (11) - TotalRawHdfsReadTime(*): 917.567ms - TotalReadThroughput: 1.26 MB/sec Instance e54f7da15a77d3d0:342167b500000004 (host=cloud_machine_2:22000):(Total: 300.651ms, non-child: 0.000ns, % non-child: 0.00%) Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:2/2.32 MB - AverageThreadTokens: 0.00 - BloomFilterBytes: 0 - PeakMemoryUsage: 6.28 MB (6581712) - PerHostPeakMemUsage: 6.28 MB (6581712) - PrepareTime: 37.604ms - RowsProduced: 2 (2) - TotalCpuTime: 505.989ms - TotalNetworkReceiveTime: 0.000ns - TotalNetworkSendTime: 183.110us - TotalStorageWaitTime: 502.133ms BlockMgr: - BlockWritesOutstanding: 0 (0) - BlocksCreated: 0 (0) - BlocksRecycled: 0 (0) - BufferedPins: 0 (0) - BytesWritten: 0 - MaxBlockSize: 8.00 MB (8388608) - MemoryLimit: 68.92 GB (74007330816) - PeakMemoryUsage: 0 - TotalBufferWaitTime: 0.000ns - TotalEncryptionTime: 0.000ns - TotalIntegrityCheckTime: 0.000ns - TotalReadBlockTime: 0.000ns CodeGen:(Total: 68.324ms, non-child: 68.324ms, % non-child: 100.00%) - CodegenTime: 960.040us - CompileTime: 6.946ms - LoadTime: 0.000ns - ModuleBitcodeSize: 1.86 MB (1953124) - NumFunctions: 21 (21) - NumInstructions: 333 (333) - OptimizationTime: 23.985ms - PrepareTime: 36.961ms DataStreamSender (dst_id=1):(Total: 117.568us, non-child: 117.568us, % non-child: 100.00%) - BytesSent: 356.00 B (356) - NetworkThroughput(*): 5.02 MB/sec - OverallThroughput: 2.89 MB/sec - RowsReturned: 2 (2) - SerializeBatchTime: 24.648us - TransmitDataRPCTime: 67.574us - UncompressedRowBatchSize: 441.00 B (441) HDFS_SCAN_NODE (id=0):(Total: 299.442ms, non-child: 299.442ms, % non-child: 100.00%) ExecOption: Expr Evaluation Codegen Disabled, PARQUET Codegen Enabled, Codegen enabled: 2 out of 2 Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:2/2.32 MB Hdfs Read Thread Concurrency Bucket: 0:0% 1:0% 2:0% 3:0% 4:0% 5:0% 6:0% 7:0% File Formats: PARQUET/NONE:6 - AverageHdfsReadThreadConcurrency: 0.00 - AverageScannerThreadConcurrency: 0.00 - BytesRead: 2.52 MB (2637306) - BytesReadDataNodeCache: 0 - BytesReadLocal: 0 - BytesReadRemoteUnexpected: 0 - BytesReadShortCircuit: 0 - DecompressionTime: 0.000ns - MaxCompressedTextFileLength: 0 - NumColumns: 3 (3) - NumDisksAccessed: 1 (1) - NumRowGroups: 2 (2) - NumScannerThreadsStarted: 2 (2) - PeakMemoryUsage: 6.27 MB (6572032) - PerReadThreadRawHdfsThroughput: 7.41 MB/sec - RemoteScanRanges: 8 (8) - RowsRead: 25.08K (25076) - RowsReturned: 2 (2) - RowsReturnedRate: 6.00 /sec - ScanRangesComplete: 2 (2) - ScannerThreadsInvoluntaryContextSwitches: 0 (0) - ScannerThreadsTotalWallClockTime: 505.994ms - MaterializeTupleTime(*): 3.052ms - ScannerThreadsSysTime: 0.000ns - ScannerThreadsUserTime: 2.998ms - ScannerThreadsVoluntaryContextSwitches: 12 (12) - TotalRawHdfsReadTime(*): 339.372ms - TotalReadThroughput: 0.00 /sec Instance e54f7da15a77d3d0:342167b500000003 (host=cloud_machine_3:22000):(Total: 295.146ms, non-child: 0.000ns, % non-child: 0.00%) Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:1/1.16 MB MemoryUsage(500.000ms): 2.12 MB ThreadUsage(500.000ms): 2 - AverageThreadTokens: 2.00 - BloomFilterBytes: 0 - PeakMemoryUsage: 4.17 MB (4372944) - PerHostPeakMemUsage: 4.17 MB (4372944) - PrepareTime: 40.105ms - RowsProduced: 1 (1) - TotalCpuTime: 329.600ms - TotalNetworkReceiveTime: 0.000ns - TotalNetworkSendTime: 350.338us - TotalStorageWaitTime: 250.893ms BlockMgr: - BlockWritesOutstanding: 0 (0) - BlocksCreated: 0 (0) - BlocksRecycled: 0 (0) - BufferedPins: 0 (0) - BytesWritten: 0 - MaxBlockSize: 8.00 MB (8388608) - MemoryLimit: 68.92 GB (74007330816) - PeakMemoryUsage: 0 - TotalBufferWaitTime: 0.000ns - TotalEncryptionTime: 0.000ns - TotalIntegrityCheckTime: 0.000ns - TotalReadBlockTime: 0.000ns CodeGen:(Total: 71.042ms, non-child: 71.042ms, % non-child: 100.00%) - CodegenTime: 1.957ms - CompileTime: 7.470ms - LoadTime: 0.000ns - ModuleBitcodeSize: 1.86 MB (1953124) - NumFunctions: 21 (21) - NumInstructions: 333 (333) - OptimizationTime: 24.257ms - PrepareTime: 38.877ms DataStreamSender (dst_id=1):(Total: 683.450us, non-child: 683.450us, % non-child: 100.00%) - BytesSent: 180.00 B (180) - NetworkThroughput(*): 266.91 KB/sec - OverallThroughput: 257.20 KB/sec - RowsReturned: 1 (1) - SerializeBatchTime: 11.730us - TransmitDataRPCTime: 658.576us - UncompressedRowBatchSize: 222.00 B (222) HDFS_SCAN_NODE (id=0):(Total: 293.038ms, non-child: 293.038ms, % non-child: 100.00%) ExecOption: Expr Evaluation Codegen Disabled, PARQUET Codegen Enabled, Codegen enabled: 1 out of 1 Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:1/1.16 MB Hdfs Read Thread Concurrency Bucket: 0:0% 1:100% 2:0% 3:0% 4:0% 5:0% 6:0% 7:0% File Formats: PARQUET/NONE:3 BytesRead(500.000ms): 639.23 KB - AverageHdfsReadThreadConcurrency: 1.00 - AverageScannerThreadConcurrency: 1.00 - BytesRead: 1.26 MB (1318607) - BytesReadDataNodeCache: 0 - BytesReadLocal: 0 - BytesReadRemoteUnexpected: 0 - BytesReadShortCircuit: 0 - DecompressionTime: 0.000ns - MaxCompressedTextFileLength: 0 - NumColumns: 3 (3) - NumDisksAccessed: 1 (1) - NumRowGroups: 1 (1) - NumScannerThreadsStarted: 1 (1) - PeakMemoryUsage: 4.16 MB (4363264) - PerReadThreadRawHdfsThroughput: 6.79 MB/sec - RemoteScanRanges: 4 (4) - RowsRead: 12.54K (12538) - RowsReturned: 1 (1) - RowsReturnedRate: 3.00 /sec - ScanRangesComplete: 1 (1) - ScannerThreadsInvoluntaryContextSwitches: 0 (0) - ScannerThreadsTotalWallClockTime: 252.714ms - MaterializeTupleTime(*): 1.482ms - ScannerThreadsSysTime: 0.000ns - ScannerThreadsUserTime: 999.000us - ScannerThreadsVoluntaryContextSwitches: 7 (7) - TotalRawHdfsReadTime(*): 185.279ms - TotalReadThroughput: 1.25 MB/sec

hrishi1dypim · ‎03-13-2019

Hello experts, I am going through very weired problem . i am inserting data into hive table through spark-sql. after that i do invalidate on impala for that table,when i query in hive/spark i get the data what i am expecting. But when i run the query in impala, i can see few records missing . please see below : hive> select * from comm_status where serverdate='2019-03-02' and report_reference_number in('CMMY07020190301'); // i am leaving selected result blank here for security concern. Time taken: 3.762 seconds, Fetched: 8 row(s) Impala-shell > select * from comm_status where serverdate='2019-03-02' and report_reference_number in('CMMY07020190301'); // i am leaving selected result blank here for security concern. Fetched 6 row(s) in 0.42s following things i tried : 1. invalidate metadata comm_status on impala 2. refresh comm_status on impala 3. msck repair table comm_status on hive 4.ALTER TABLE comm_status RECOVER PARTITIONS; on impala 5. restarted hive and impala cluster both and again repeated 1-4 steps. 6. i checked hive-site.xml in hive/conf , impala/conf and spark/conf to have same metastore url and it was same 🙂 Now i cant get any idea to resolve this problem. I am using impala 2.7, spark 2.3, hive 2.1

hrishi1dypim · ‎10-11-2018

Cloudera employee , any update on this please ?

hrishi1dypim · ‎10-03-2018

I think here secondary namenode does not have capabilities to push newly created fsimage to primary namenode. checkpoint node has that capability. in case of secondary namnode its primary namenode's responsibilites to pull that updated fsimage while start up.

hrishi1dypim · ‎06-06-2018

No

hrishi1dypim · ‎06-06-2018

No,I installed through tar balls manually.

hrishi1dypim · ‎05-17-2018

you are correct, i tested. i thought it shows min.insynch.replicas also,thanks for help.

hrishi1dypim · ‎05-17-2018

Hello, Recently i have made UAT cluster of KAFKA with three brokers. the problem is min.insync.replicas is not taking any effect. following things are happeninig : 1. in server.properties there is no mention of min.insync.replicas, whenever i am creating any topic and whatever replication number i am providing ,in synch replica is just copying that. so if i create topic test with replica 3 in synch replica is also 3. even if i am creating replication with 2 then in synch is also 2. Partition Detail Partition First Offset Last Offset Size Leader Replicas In Sync Replicas Preferred Leader? Under Replicated? 0 0 0 0 2 2,3,1 1,2,3 Yes No 1 0 0 0 3 3,1,2 1,2,3 Yes No 2 0 0 0 1 1,2,3 1,2,3 Yes No 2. i tried to put min.insync.replicas=2 in server.properties of all the brokers and tried creating the topic with replication 3, here i was expecting in synch replica to be 2 but no its again 3. 3. then i tried to create topic by providing config with this : kafka-topics.sh --create --zookeeper machine1:2181,machine2:2181,machine3:2181 --replication-factor 3 --partitions 3 --config min.insync.replicas=1 --topic syncdemo here i thought this time it will show in synch replica as 1 but no ,it stills 3. 4. lastly i tried to alter it to provide 2 instead of 1 to check whether alter make it work by below : kafka-topics.sh --alter --zookeeper machine1:2181,machine2:2181,machine3:2181 --topic syncdemo --config min.insync.replicas=1 , but no effect. its really very frustrating that none of the ways are making it work. please provide the solution,thanks in advance. i am using : CDH 5.7 and apache Kafka : kafka_2.11-1.0.0 , zookeeper-3.4.5-cdh5.7.0 on redhat 6.5.

hrishi1dypim · ‎05-17-2018

No... not even got the reply from cloudera employee.

Online	Offline
Last Visited	‎09-26-2019 11:37 PM

Member Since	‎09-29-2016 03:07 AM
Last Visited	‎09-26-2019 11:37 PM
Posts	60

Cloudera Community

Re: Parquet files should not be split into multipl...

Re: Few records missing when querying through im...

Few records missing when querying through impala...

Re: Cloudera manager/agent installation without ro...

Re: Question on Secondary name node and edits log

Re: Cloudera manager/agent installation without ro...

Re: Cloudera manager/agent installation without ro...

Re: KAFKA min.insync.replicas not taking effect.

KAFKA min.insync.replicas not taking effect.

Re: Cloudera manager/agent installation without ro...