Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Improve impala execution rate?

avatar
Explorer

I have a query which is - join between 2 tables. One has 1000 rows in it (broadcast, right side table) and the other has 15 million rows. The result of join is 28 million rows. Shouldn't take 33 minutes with impala I assume. The part of the profile which shows the most time taken...

 

Averaged Fragment F00:(Total: 17m36s, non-child: 0.000ns, % non-child: 0.00%)
split sizes: min: 20.40 MB, max: 250.81 MB, avg: 135.60 MB, stddev: 115.20 MB
completion times: min:2m2s max:33m11s mean: 17m37s stddev:15m34s
execution rates: min:128.98 KB/sec max:170.00 KB/sec mean:149.49 KB/sec stddev:20.51 KB/sec
num instances: 2
- AverageThreadTokens: 1.93
- BloomFilterBytes: 2.00 MB (2097152)
- PeakMemoryUsage: 39.61 MB (41537580)
- PerHostPeakMemUsage: 41.61 MB (43634732)
- PrepareTime: 142.003ms
- RowsProduced: 14.41M (14412000)
- TotalCpuTime: 34m58s
- TotalNetworkReceiveTime: 0.000ns
- TotalNetworkSendTime: 0.000ns
- TotalStorageWaitTime: 18.000ms
BlockMgr:
- BlockWritesOutstanding: 0 (0)
- BlocksCreated: 32 (32)
- BlocksRecycled: 1 (1)
- BufferedPins: 0 (0)
- BytesWritten: 0
- MaxBlockSize: 8.00 MB (8388608)
- MemoryLimit: 6.85 GB (7360164864)
- PeakMemoryUsage: 32.00 KB (32768)
- TotalBufferWaitTime: 0.000ns
- TotalEncryptionTime: 0.000ns
- TotalIntegrityCheckTime: 0.000ns
- TotalReadBlockTime: 0.000ns
CodeGen:(Total: 736.017ms, non-child: 736.017ms, % non-child: 100.00%)
- CodegenTime: 0.000ns
- CompileTime: 184.004ms
- LoadTime: 0.000ns
- ModuleBitcodeSize: 1.85 MB (1940340)
- OptimizationTime: 414.010ms
- PrepareTime: 136.003ms
HdfsTableSink:(Total: 5s532ms, non-child: 5s532ms, % non-child: 100.00%)
- BytesWritten: 1.74 MB (1824896)
- CompressTimer: 14.000ms
- EncodeTimer: 5s506ms
- FilesCreated: 0 (0)
- FinalizePartitionFileTimer: 32.000ms
- HdfsWriteTimer: 6.000ms
- PartitionsCreated: 0 (0)
- PeakMemoryUsage: 2.52 MB (2637684)
- RowsInserted: 14.41M (14412000)
- TmpFileCreateTimer: 12.000ms
HASH_JOIN_NODE (id=2):(Total: 17m30s, non-child: 17m30s, % non-child: 99.98%)
- BuildPartitionTime: 0.000ns
- BuildRows: 1.00K (1000)
- BuildRowsPartitioned: 1.00K (1000)
- BuildTime: 2.000ms
- GetNewBlockTime: 0.000ns
- HashBuckets: 2.05K (2048)
- HashCollisions: 0 (0)
- LargestPartitionPercent: 100 (100)
- MaxPartitionLevel: 0 (0)
- NumRepartitions: 0 (0)
- PartitionsCreated: 16 (16)
- PeakMemoryUsage: 2.05 MB (2149536)
- PinTime: 0.000ns
- ProbeRows: 2.59M (2590143)
- ProbeRowsPartitioned: 0 (0)
- ProbeTime: 17m30s
- RowsReturned: 14.41M (14412000)
- RowsReturnedRate: 7.28 K/sec
- SpilledPartitions: 0 (0)
- UnpinTime: 0.000ns
EXCHANGE_NODE (id=3):
- BytesReceived: 10.60 KB (10850)
- ConvertRowBatchTime: 0.000ns
- DeserializeRowBatchTimer: 0.000ns
- FirstBatchArrivalWaitTime: 0.000ns
- PeakMemoryUsage: 0
- RowsReturned: 1.00K (1000)
- RowsReturnedRate: 0
- SendersBlockedTimer: 0.000ns
- SendersBlockedTotalTimer(*): 0.000ns
HDFS_SCAN_NODE (id=0):(Total: 238.005ms, non-child: 238.005ms, % non-child: 100.00%)
- AverageHdfsReadThreadConcurrency: 0.00
- AverageScannerThreadConcurrency: 0.93
- BytesRead: 23.78 MB (24934683)
- BytesReadDataNodeCache: 0
- BytesReadLocal: 23.78 MB (24934683)
- BytesReadRemoteUnexpected: 0
- BytesReadShortCircuit: 23.78 MB (24934683)
- DecompressionTime: 40.000ms
- MaxCompressedTextFileLength: 0
- NumColumns: 9 (9)
- NumDisksAccessed: 1 (1)
- NumRowGroups: 1 (1)
- NumScannerThreadsStarted: 1 (1)
- PeakMemoryUsage: 31.07 MB (32577052)
- PerReadThreadRawHdfsThroughput: 1.08 GB/sec
- RemoteScanRanges: 0 (0)
- RowsRead: 7.96M (7959877)
- RowsReturned: 2.59M (2590143)
- RowsReturnedRate: 8.73 M/sec
- ScanRangesComplete: 1 (1)
- ScannerThreadsInvoluntaryContextSwitches: 6.36K (6360)
- ScannerThreadsTotalWallClockTime: 17m21s
- MaterializeTupleTime(*): 17m20s
- ScannerThreadsSysTime: 74.049ms
- ScannerThreadsUserTime: 12s312ms
- ScannerThreadsVoluntaryContextSwitches: 2.55K (2552)
- TotalRawHdfsReadTime(*): 20.000ms
- TotalReadThroughput: 27.75 KB/sec
Filter 0:
- Rows processed: 16.38K (16383)
- Rows rejected: 0 (0)
- Rows total: 16.38K (16384)

 

I see that the execution rates are in the range of 140KB/sec. I have seen other profile dumps in questions posted and some were in the range of 5MB/sec or even 25MB/sec... Not sure if this rate is the root cause but just wanted to know what it is and how I can improve it, if at all...

1 ACCEPTED SOLUTION

avatar

Ok, that's interesting.  The nonsense-looking symbol up the top is probably jitted code from your query, probably an expression or something like that. 

 

36.50%  perf-18476.map                 [.] 0x00007f3c1d634b82                                                                                                                                            

The other symbols like GetDoubleVal() may be what is calling this expensive function. It looks like it's possible ProbeTime in the profile that's the culprit.

 

Can you share the SQL for your query at all? I'm guessing that there's some expression in your query that's expensive to evaluate. E.g. joining on some complex expression, or doing some kind of expensive computation.

 

View solution in original post

7 REPLIES 7

avatar

There's something strange going on here, the profile reports that the scan took around 12 seconds of CPU time, but 17 minutes of wall-clock time. So for whatever reason the scan is spending most of its time swapped out and unable to execute.

 

- MaterializeTupleTime(*): 17m20s
- ScannerThreadsSysTime: 74.049ms
- ScannerThreadsUserTime: 12s312ms

 

Is the system under heavy load or is it swapping to disk?

avatar
Explorer

There are no other jobs running on this cluster. This is dev cluster with 5 nodes, the join however runs only on one node and at the time this runs, the load average is consistently > 1, for as long as the join runs. The host CPU usage however is consistently at 50%. The swap rate, disk latency etc., did not spike during the run. I can run it again and check. Is there anything in particular you want me to capture?

 

There are 2 cores on that node (join only runs on one right?). 

 

probably not relevant: This is a "create table as select..." query. I ran it without the "create table" part and it still took the same amount of time. So writing to disk is *probably* not the issue.

avatar

Maybe run 'perf top' to see where it's spending the time?


I'd expect the scan to run on one core and the join and insert to run on a different core.

avatar
Explorer

This is pretty much how it looks for the entire time... give or take few percentage points for the top 2 rows.

 

36.50%  perf-18476.map                 [.] 0x00007f3c1d634b82                                                                                                                                           

14.61%  impalad                        [.] impala::ExprContext::GetValue(impala::Expr*, impala::TupleRow*)                                                                                              

  6.77%  impalad                        [.] impala::HiveUdfCall::Evaluate(impala::ExprContext*, impala::TupleRow*)                                                                                       

  5.26%  libjvm.so                      [.] jni_invoke_nonstatic(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*)                                                

  3.02%  libjvm.so                      [.] JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)                                                                               

  2.57%  impalad                        [.] impala::ScalarFnCall::GetDoubleVal(impala::ExprContext*, impala::TupleRow*)

avatar
Explorer

After the query completes, this is how it looks - 

 

  8.69%  python2.7                      [.] 0x00000000000eec22                                                                                                                                           

  5.70%  python2.7                      [.] PyEval_EvalFrameEx                                                                                                                                           

  5.24%  perf-18476.map                 [.] 0x00007f3c1d634b82                                                                                                                                           

  4.39%  [unknown]                      [.] 0x0000000000487bd0                                                                                                                                           

  4.11%  perf-12892.map                 [.] 0x00007f8e45020431                                                                                                                                           

  3.74%  perf                           [.] 0x00000000000caa5b                                                                                                                                           

  3.64%  libc-2.19.so                   [.] 0x000000000015fc60                                                                                                                                           

  1.96%  impalad                        [.] impala::ExprContext::GetValue(impala::Expr*, impala::TupleRow*) 

avatar

Ok, that's interesting.  The nonsense-looking symbol up the top is probably jitted code from your query, probably an expression or something like that. 

 

36.50%  perf-18476.map                 [.] 0x00007f3c1d634b82                                                                                                                                            

The other symbols like GetDoubleVal() may be what is calling this expensive function. It looks like it's possible ProbeTime in the profile that's the culprit.

 

Can you share the SQL for your query at all? I'm guessing that there's some expression in your query that's expensive to evaluate. E.g. joining on some complex expression, or doing some kind of expensive computation.

 

avatar
Explorer

Thank you for explaining it. Its a function call. Changing it and will see the impact. Will come back with results...