Created on 06-04-2018 11:57 AM - edited 09-16-2022 06:17 AM
We recently enabled hdfs caching for two tables to try and speed up a whole class of queries that are very similar, generally following this pattern:
SELECT x,y,z FROM (
SELECT x,y,z FROM table1 WHERE blah
UNION ALL
SELECT x,y,z FROM table2 WHERE blah
) x
ORDER BY x DESC, y DESC
LIMIT 20001 OFFSET 0
... but we didn't get much runtime improvement. Digging in it looks like 80% of the time is spent on CodeGen: 5.25s, of that CompileTime: 1.67s and OptimizationTime: 3.51s (see profile fragment below for this sample run).
With set DISABLE_CODEGEN=true query goes from ~6 seconds to ~1 second, but docs state this should not be used generally, so hesitant to add that in actual live production reports, and would rather want to understand root cause.
Both tables are parquet, fully hdfs-cached. Both are wide-ish: 253 and 126 cols respectively, but inner queries project only 20 cols to the outer.
CDH 5.13 / Impala 2.10. Happy to send full profile file by direct mail.
Thanks in advance,
-mauricio
78:MERGING-EXCHANGE 1 5s307ms 5s307ms 73 101 0 0 UNPARTITIONED
49:TOP-N 30 341.689us 880.634us 73 101 873.00 KB 39.28 KB
00:UNION 30 240.707us 3.190ms 73 1.61K 8.81 MB 0
...
F35:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Host Resources: mem-estimate=0B mem-reservation=0B
PLAN-ROOT SINK
| mem-estimate=0B mem-reservation=0B
|
78:MERGING-EXCHANGE [UNPARTITIONED]
| order by: action_date DESC, action_id ASC
| limit: 101
| mem-estimate=0B mem-reservation=0B
| tuple-ids=47 row-size=398B cardinality=101
|
F34:PLAN FRAGMENT [RANDOM] hosts=18 instances=18
Per-Host Resources: mem-estimate=206.48MB mem-reservation=14.44MB
49:TOP-N [LIMIT=101]
| order by: action_date DESC, action_id ASC
| mem-estimate=39.28KB mem-reservation=0B
| tuple-ids=47 row-size=398B cardinality=101
|
00:UNION
...
>>>>> F34 Fragment for a sample node (all very similar):
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 8:1/38.32 MB
Filter 4 arrival: 5s339ms
AverageThreadTokens: 1.00
BloomFilterBytes: 3.0 MiB
InactiveTotalTime: 0ns
PeakMemoryUsage: 27.4 MiB
PeakReservation: 14.4 MiB
PeakUsedReservation: 0 B
PerHostPeakMemUsage: 58.6 MiB
RowsProduced: 1
TotalNetworkReceiveTime: 261.74us
TotalNetworkSendTime: 313.68us
TotalStorageWaitTime: 4.96us
TotalThreadsInvoluntaryContextSwitches: 583
TotalThreadsTotalWallClockTime: 5.37s
TotalThreadsSysTime: 53ms
TotalThreadsUserTime: 5.20s
TotalThreadsVoluntaryContextSwitches: 169
TotalTime: 5.43s
>> Fragment Instance Lifecycle Timings (0ns)
>> DataStreamSender (dst_id=78) (1ms)
>> CodeGen (5.25s)
CodegenTime: 26ms
CompileTime: 1.67s <<<<<<<<<<<<< ????
InactiveTotalTime: 0ns
LoadTime: 0ns
ModuleBitcodeSize: 1.9 MiB
NumFunctions: 729
NumInstructions: 35,078
OptimizationTime: 3.51s <<<<<<<<<<<<< ????
PeakMemoryUsage: 17.1 MiB
PrepareTime: 66ms
TotalTime: 5.25s
>> SORT_NODE (id=49) (94ms)
>> UNION_NODE (id=0) (93ms)
>> HASH_JOIN_NODE (id=48) (9ms)
Created 06-05-2018 04:25 PM
@mauriciothanks for the profile. I think you might be better off tweaking DISABLE_CODEGEN_ROWS_THRESHOLD instead of using the big hammer of DISABLE_CODEGEN.
The way that option works is that codegen is disabled automatically if the planner detects that no point in the query plan processes that number of rows per backend. The default is 50,000. E.g. if your query scans 100,000 rows split across three backends (33,333 per backend), it will disable codegen automatically.
Instead of setting DISABLE_CODEGEN, I'd suggest increasing the value first. Based on the profile you sent me, it looks like something like 400000 might be sufficient for that query at least.
Created 06-12-2018 04:46 PM
Never mind my last comment: I was confused because the DISABLE_CODEGEN_ROWS_THRESHOLD setting @Tim Armstrong recommended was not yet documented, so tried using the closest thing I found (SCAN_NODE_CODEGEN_THRESHOLD) which wasn't applicable to our query.
Turns out even though not yet documented, DISABLE_CODEGEN_ROWS_THRESHOLD is available and works as Tim suggested, in our CDH 5.13 cluster.
Created 06-12-2018 05:12 PM
@mauriciothat's great news! Thanks for the update. We do need to get this documented though.