Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Very slow CodeGen taking 80% of runtime

avatar
Rising Star

We recently enabled hdfs caching for two tables to try and speed up a whole class of queries that are very similar, generally following this pattern:

 

SELECT x,y,z FROM (

SELECT x,y,z FROM table1 WHERE blah

UNION ALL

SELECT x,y,z FROM table2 WHERE blah

) x
ORDER BY x DESC, y DESC

LIMIT 20001 OFFSET 0


... but we didn't get much runtime improvement. Digging in it looks like 80% of the time is spent on CodeGen: 5.25s, of that CompileTime: 1.67s and OptimizationTime: 3.51s (see profile fragment below for this sample run).

 

With set DISABLE_CODEGEN=true query goes from ~6 seconds to ~1 second, but docs state this should not be used generally, so hesitant to add that in actual live production reports, and would rather want to understand root cause.

 

Both tables are parquet, fully hdfs-cached. Both are wide-ish: 253 and 126 cols respectively, but inner queries project only 20 cols to the outer.

 

CDH 5.13 / Impala 2.10.  Happy to send full profile file by direct mail.

 

Thanks in advance,

 

-mauricio

 


78:MERGING-EXCHANGE 1 5s307ms 5s307ms 73 101 0 0 UNPARTITIONED
49:TOP-N 30 341.689us 880.634us 73 101 873.00 KB 39.28 KB
00:UNION 30 240.707us 3.190ms 73 1.61K 8.81 MB 0

...


F35:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Host Resources: mem-estimate=0B mem-reservation=0B
PLAN-ROOT SINK
| mem-estimate=0B mem-reservation=0B
|
78:MERGING-EXCHANGE [UNPARTITIONED]
| order by: action_date DESC, action_id ASC
| limit: 101
| mem-estimate=0B mem-reservation=0B
| tuple-ids=47 row-size=398B cardinality=101
|
F34:PLAN FRAGMENT [RANDOM] hosts=18 instances=18
Per-Host Resources: mem-estimate=206.48MB mem-reservation=14.44MB
49:TOP-N [LIMIT=101]
| order by: action_date DESC, action_id ASC
| mem-estimate=39.28KB mem-reservation=0B
| tuple-ids=47 row-size=398B cardinality=101
|
00:UNION

 ...

 

 

>>>>> F34 Fragment for a sample node (all very similar):
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 8:1/38.32 MB
Filter 4 arrival: 5s339ms
AverageThreadTokens: 1.00
BloomFilterBytes: 3.0 MiB
InactiveTotalTime: 0ns
PeakMemoryUsage: 27.4 MiB
PeakReservation: 14.4 MiB
PeakUsedReservation: 0 B
PerHostPeakMemUsage: 58.6 MiB
RowsProduced: 1
TotalNetworkReceiveTime: 261.74us
TotalNetworkSendTime: 313.68us
TotalStorageWaitTime: 4.96us
TotalThreadsInvoluntaryContextSwitches: 583
TotalThreadsTotalWallClockTime: 5.37s
TotalThreadsSysTime: 53ms
TotalThreadsUserTime: 5.20s
TotalThreadsVoluntaryContextSwitches: 169
TotalTime: 5.43s
>> Fragment Instance Lifecycle Timings (0ns)
>> DataStreamSender (dst_id=78) (1ms)
>> CodeGen (5.25s)
CodegenTime: 26ms
CompileTime: 1.67s <<<<<<<<<<<<< ????
InactiveTotalTime: 0ns
LoadTime: 0ns
ModuleBitcodeSize: 1.9 MiB
NumFunctions: 729
NumInstructions: 35,078
OptimizationTime: 3.51s <<<<<<<<<<<<< ????
PeakMemoryUsage: 17.1 MiB
PrepareTime: 66ms
TotalTime: 5.25s
>> SORT_NODE (id=49) (94ms)
>> UNION_NODE (id=0) (93ms)
>> HASH_JOIN_NODE (id=48) (9ms)

1 ACCEPTED SOLUTION

avatar
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
11 REPLIES 11

avatar
Rising Star

Never mind my last comment: I was confused because the DISABLE_CODEGEN_ROWS_THRESHOLD setting @Tim Armstrong recommended was not yet documented, so tried using the closest thing I found (SCAN_NODE_CODEGEN_THRESHOLD) which wasn't applicable to our query.

 

Turns out even though not yet documented, DISABLE_CODEGEN_ROWS_THRESHOLD is available and works as Tim suggested, in our CDH 5.13 cluster.

avatar

@mauriciothat's great news! Thanks for the update. We do need to get this documented though.