About hl-man

hl-man · ‎12-18-2020

We have restarted nearly every component of the affected HDFS cluster and impala performance has improved. Sadly that doesn't explain the underlying issue.

hl-man · ‎12-18-2020

Thanks for coming back, Tim!

hl-man · ‎12-17-2020

hi Tim, thanks for your response. Was it truncated by chance?

hl-man · ‎12-16-2020

Hello guys, we are experiencing slow hdfs scan issue i.e. there a query run on 17 nodes and in query profile some nodes a way slower than the other ones. Switching off the slowest node just causes the other node to become the slowest whereas it used to be the average. We are using CDH 5.10. Here are parts of the query profile with problematic nodes. I've taken the slowest and the fastest one: THE SLOWEST NODE THE FASTEST NODE Hdfs Read Thread Concurrency Bucket: 0:56.2% 1:36.5% 2:7.299% 3:0% 4:0% 5:0% 6:0% 7:0% File Formats: PARQUET/SNAPPY:4025 AverageHdfsReadThreadConcurrency: 0.51 AverageScannerThreadConcurrency: 2.72 BytesRead: 2.7 GiB BytesReadDataNodeCache: 0 B BytesReadLocal: 2.7 GiB BytesReadRemoteUnexpected: 0 B BytesReadShortCircuit: 2.7 GiB DecompressionTime: 4.25s InactiveTotalTime: 0ns MaxCompressedTextFileLength: 0 B NumColumns: 23 NumDisksAccessed: 4 NumRowGroups: 175 NumScannerThreadsStarted: 3 PeakMemoryUsage: 133.9 MiB PerReadThreadRawHdfsThroughput: 65.6 MiB/s RemoteScanRanges: 0 RowBatchQueueGetWaitTime: 1.1m RowBatchQueuePutWaitTime: 0ns RowsRead: 192,701,761 RowsReturned: 18,748 RowsReturnedRate: 294 per second ScanRangesComplete: 175 ScannerThreadsInvoluntaryContextSwitches: 24,401 ScannerThreadsTotalWallClockTime: 3.1m MaterializeTupleTime(): 1.9m ScannerThreadsSysTime: 17.81s ScannerThreadsUserTime: 1.6m ScannerThreadsVoluntaryContextSwitches: 19,141 TotalRawHdfsReadTime(): 41.81s TotalReadThroughput: 39.7 MiB/s TotalTime: 1.1m Hdfs Read Thread Concurrency Bucket: 0:86.84% 1:13.16% 2:0% 3:0% 4:0% 5:0% 6:0% 7:0% File Formats: PARQUET/SNAPPY:4209 AverageHdfsReadThreadConcurrency: 0.13 AverageScannerThreadConcurrency: 5.92 BytesRead: 2.7 GiB BytesReadDataNodeCache: 0 B BytesReadLocal: 2.7 GiB BytesReadRemoteUnexpected: 0 B BytesReadShortCircuit: 2.7 GiB DecompressionTime: 3.50s InactiveTotalTime: 0ns MaxCompressedTextFileLength: 0 B NumColumns: 23 NumDisksAccessed: 4 NumRowGroups: 183 NumScannerThreadsStarted: 6 PeakMemoryUsage: 329.9 MiB PerReadThreadRawHdfsThroughput: 529.1 MiB/s RemoteScanRanges: 0 RowBatchQueueGetWaitTime: 14.64s RowBatchQueuePutWaitTime: 0ns RowsRead: 192,490,029 RowsReturned: 21,148 RowsReturnedRate: 1437 per second ScanRangesComplete: 183 ScannerThreadsInvoluntaryContextSwitches: 7,158 ScannerThreadsTotalWallClockTime: 1.9m MaterializeTupleTime(): 1.5m ScannerThreadsSysTime: 13.94s ScannerThreadsUserTime: 1.2m ScannerThreadsVoluntaryContextSwitches: 47,621 TotalRawHdfsReadTime: 5.15s TotalReadThroughput: 143.4 MiB/s TotalTime: 14.71s The other nodes were about 20 seconds on the average and the slowest one was far away from any other. As we notice BytesReadLocal is the same, and it is the same on the rest of the nodes. But PerReadThreadRawHdfsThroughput, RowBatchQueueGetWaitTime, RowsReturnedRate, ScannerThreadsInvoluntaryContextSwitches are very much different. The strange part is the same node can be fast in terms of query/throughput until we stop impalad on the node being the slowest one. Then with almost the same amount of data scanned (removing a single node is changing BytesReadLocalis from e.g. 2.5G to 2.7G ) the speed degrades dramatically like 2 or 3 times worse. Does anyone have an idea what could be wrong? Thanks in advance!

Online	Offline
Last Visited	‎12-18-2020 09:54 AM

Member Since	‎12-16-2020 08:29 AM
Last Visited	‎12-18-2020 09:54 AM
Posts	4

Cloudera Community

Re: Some nodes are way slower on HDFS scan then th...

Re: Some nodes are way slower on HDFS scan then th...

Re: Some nodes are way slower on HDFS scan then th...

Some nodes are way slower on HDFS scan then the ot...