Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Meaning of RowBatchQueueGetWaitTime metric

Solved Go to solution

Meaning of RowBatchQueueGetWaitTime metric

Contributor

Can anyone explain whatis RowBatchQueueGetWaitTime? 

 

I'm looking into a slow-ish query that is taking 2 to 3 seconds to do hdfs scan on most nodes and I don't see why it should take that long: 3 or so files per node only a couple K each, cached (and confirmed all read from cache).  The only thing that looks odd is this metric.  

 

Here's a sample relevant profile fragment (about same for all executors):

 

>>> HDFS_SCAN_NODE (id=0) (1.96s)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 18:1/69.15 KB 20:2/142.83 KB
ExecOption: PARQUET Codegen Enabled, Codegen enabled: 3 out of 3
Runtime filters: Not all filters arrived (arrived: [1], missing [0]), waited for 352ms
Hdfs Read Thread Concurrency Bucket: 0:100% 1:0% 2:0% 3:0% 4:0% 5:0% 6:0% 7:0% 8:0% 9:0% 10:0% 11:0% 12:0% 13:0% 14:0% 15:0%
File Formats: PARQUET/SNAPPY:156
AverageHdfsReadThreadConcurrency: 0.00
AverageScannerThreadConcurrency: 1.00
BytesRead: 228.0 KiB
BytesReadDataNodeCache: 228.0 KiB
BytesReadLocal: 228.0 KiB
BytesReadRemoteUnexpected: 0 B
BytesReadShortCircuit: 228.0 KiB
CachedFileHandle**bleep**Count: 0
CachedFileHandlesMissCount: 159
DecompressionTime: 188.47us
InactiveTotalTime: 0ns
MaxCompressedTextFileLength: 0 B
NumColumns: 52
NumDictFilteredRowGroups: 0
NumDisksAccessed: 0
NumRowGroups: 3
NumScannerThreadsStarted: 1
NumScannersWithNoReads: 0
NumStatsFilteredRowGroups: 0
PeakMemoryUsage: 499.3 KiB
PerReadThreadRawHdfsThroughput: 0 B/s
RemoteScanRanges: 0
RowBatchQueueGetWaitTime: 1.60s
RowBatchQueuePutWaitTime: 0ns
RowsRead: 426
RowsReturned: 2
RowsReturnedRate: 1 per second
ScanRangesComplete: 3
ScannerThreadsInvoluntaryContextSwitches: 8
ScannerThreadsTotalWallClockTime: 1.89s
MaterializeTupleTime(*): 16ms
ScannerThreadsSysTime: 10ms
ScannerThreadsUserTime: 73ms
ScannerThreadsVoluntaryContextSwitches: 393
TotalRawHdfsReadTime(*): 0ns
TotalReadThroughput: 88.4 KiB/s
TotalTime: 1.96s
>>> Filter 0 (1.00 MB) (0ns)
InactiveTotalTime: 0ns
Rows processed: 0
Rows rejected: 0
Rows total: 426
TotalTime: 0ns
>>> Filter 1 (1.00 MB) (0ns)
InactiveTotalTime: 0ns
Rows processed: 426
Rows rejected: 424
Rows total: 426
TotalTime: 0ns

 

Thanks in advance!

 

-m

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Meaning of RowBatchQueueGetWaitTime metric

Cloudera Employee

Impala scanners internally have a RowBatch queue that allows Impala to decouple I/O from CPU processing. The I/O threads read data into RowBatches and put them into a queue, CPU threads asynchronously fetch data from the queue and process them. RowBatchQueueGetWaitTime is the amount of time CPU threads wait on data to arrive into the queue. Essentially, it means the CPU threads were waiting a long time for the I/O threads to read the data.

1 REPLY 1
Highlighted

Re: Meaning of RowBatchQueueGetWaitTime metric

Cloudera Employee

Impala scanners internally have a RowBatch queue that allows Impala to decouple I/O from CPU processing. The I/O threads read data into RowBatches and put them into a queue, CPU threads asynchronously fetch data from the queue and process them. RowBatchQueueGetWaitTime is the amount of time CPU threads wait on data to arrive into the queue. Essentially, it means the CPU threads were waiting a long time for the I/O threads to read the data.