- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Impala Queries Out-of-Memory Stability Issues
- Labels:
-
Apache Impala
Created on ‎10-23-2019 11:23 AM - edited ‎10-23-2019 11:25 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
CDH 5.15.0
CentOS 6.10 final
Hey,
My internal team members have been using an enterprise CM environment installed on a cluster with more-than-adequate hardware (used to be customer-facing and handle multiple large queries at once), but they have been complaining about their scripts and queries failing inconsistently due to out-of-memory errors. This is occurring with both users with memory limits as well as users with free access to the entire cluster's resources. An example output is shown below. Is this a known issue with the current CDH version? The reason I'm raising this concern is because this cluster used to run smoothly under much heavier query load and query concurrency, and now it seems to be a roll of the dice every time a non-tiny query is run.
Memory limit exceeded: Error occurred on backend <hostname> by fragment b84dc213ea94e53d:a98ab78000000ad Memory left in process limit: 125.63 GB Memory left in query limit: -130.89 KB Query(b84dc213ea94e53d:a98ab7800000000): memory limit exceeded. Limit=1.00 GB Reservation=441.88 MB ReservationLimit=819.20 MB OtherMemory=582.25 MB Total=1.00 GB Peak=1.00 GB Unclaimed reservations: Reservation=112.00 MB OtherMemory=0 Total=112.00 MB Peak=237.75 MB Fragment b84dc213ea94e53d:a98ab7800000141: Reservation=0 OtherMemory=57.64 KB Total=57.64 KB Peak=1.57 MB AGGREGATION_NODE (id=49): Total=42.12 KB Peak=42.12 KB Exprs: Total=42.12 KB Peak=42.12 KB EXCHANGE_NODE (id=48): Reservation=0 OtherMemory=0 Total=0 Peak=0 DataStreamRecvr: Total=0 Peak=0 DataStreamSender (dst_id=50): Total=424.00 B Peak=424.00 B CodeGen: Total=7.10 KB Peak=1.52 MB Fragment b84dc213ea94e53d:a98ab7800000122: Reservation=0 OtherMemory=10.32 MB Total=10.32 MB Peak=14.39 MB AGGREGATION_NODE (id=32): Total=42.12 KB Peak=42.12 KB Exprs: Total=42.12 KB Peak=42.12 KB HASH_JOIN_NODE (id=31): Total=142.25 KB Peak=142.25 KB Exprs: Total=31.12 KB Peak=31.12 KB Hash Join Builder (join_node_id=31): Total=31.12 KB Peak=31.12 KB Hash Join Builder (join_node_id=31) Exprs: Total=31.12 KB Peak=31.12 KB EXCHANGE_NODE (id=46): Reservation=0 OtherMemory=10.09 MB Total=10.09 MB Peak=10.09 MB DataStreamRecvr: Total=10.09 MB Peak=10.09 MB EXCHANGE_NODE (id=47): Reservation=0 OtherMemory=0 Total=0 Peak=0 DataStreamRecvr: Total=0 Peak=0 DataStreamSender (dst_id=48): Total=12.11 KB Peak=12.11 KB CodeGen: Total=31.34 KB Peak=4.59 MB Fragment b84dc213ea94e53d:a98ab780000006b: Reservation=34.00 MB OtherMemory=17.89 MB Total=51.89 MB Peak=51.89 MB HASH_JOIN_NODE (id=30): Reservation=34.00 MB OtherMemory=2.60 MB Total=36.60 MB Peak=36.60 MB Exprs: Total=43.12 KB Peak=43.12 KB Hash Join Builder (join_node_id=30): Total=39.12 KB Peak=63.12 KB Hash Join Builder (join_node_id=30) Exprs: Total=39.12 KB Peak=39.12 KB EXCHANGE_NODE (id=37): Reservation=0 OtherMemory=10.04 MB Total=10.04 MB Peak=10.04 MB DataStreamRecvr: Total=10.04 MB Peak=10.04 MB EXCHANGE_NODE (id=38): Reservation=0 OtherMemory=0 Total=0 Peak=1.20 MB DataStreamRecvr: Total=0 Peak=1.20 MB DataStreamSender (dst_id=46): Total=2.85 MB Peak=3.61 MB CodeGen: Total=11.39 KB Peak=1.51 MB Fragment b84dc213ea94e53d:a98ab7800000034: Reservation=1.94 MB OtherMemory=409.38 MB Total=411.32 MB Peak=411.32 MB HASH_JOIN_NODE (id=29): Reservation=1.94 MB OtherMemory=6.95 MB Total=8.89 MB Peak=12.12 MB Exprs: Total=21.12 KB Peak=21.12 KB Hash Join Builder (join_node_id=29): Total=21.12 KB Peak=45.12 KB Hash Join Builder (join_node_id=29) Exprs: Total=21.12 KB Peak=21.12 KB HDFS_SCAN_NODE (id=0): Total=393.15 MB Peak=393.15 MB Exprs: Total=4.00 KB Peak=4.00 KB EXCHANGE_NODE (id=35): Reservation=0 OtherMemory=0 Total=0 Peak=4.02 KB DataStreamRecvr: Total=0 Peak=4.02 KB DataStreamSender (dst_id=37): Total=3.03 MB Peak=6.07 MB DataStreamSender (dst_id=37) Exprs: Total=4.00 KB Peak=4.00 KB CodeGen: Total=12.10 KB Peak=1.76 MB Fragment b84dc213ea94e53d:a98ab780000001f: Reservation=0 OtherMemory=0 Total=0 Peak=3.51 MB HASH_JOIN_NODE (id=6): Reservation=0 OtherMemory=0 Total=0 Peak=2.02 MB Hash Join Builder (join_node_id=6): Total=0 Peak=37.12 KB HDFS_SCAN_NODE (id=5): Total=0 Peak=326.00 KB EXCHANGE_NODE (id=34): Reservation=0 OtherMemory=0 Total=0 Peak=4.02 KB DataStreamRecvr: Total=0 Peak=4.02 KB DataStreamSender (dst_id=35): Total=0 Peak=177.28 KB CodeGen: Total=0 Peak=1.53 MB Fragment b84dc213ea94e53d:a98ab7800000056: Reservation=0 OtherMemory=0 Total=0 Peak=22.23 MB SELECT_NODE (id=11): Total=0 Peak=1.02 MB ANALYTIC_EVAL_NODE (id=10): Reservation=0 OtherMemory=0 Total=0 Peak=5.54 MB ANALYTIC_EVAL_NODE (id=9): Reservation=0 OtherMemory=0 Total=0 Peak=4.53 MB SORT_NODE (id=8): Reservation=0 OtherMemory=0 Total=0 Peak=12.12 MB EXCHANGE_NODE (id=36): Reservation=0 OtherMemory=0 Total=0 Peak=2.04 MB DataStreamRecvr: Total=0 Peak=2.04 MB DataStreamSender (dst_id=38): Total=0 Peak=1.02 MB CodeGen: Total=0 Peak=1.13 MB Fragment b84dc213ea94e53d:a98ab780000004a: Reservation=0 OtherMemory=0 Total=0 Peak=144.13 KB HDFS_SCAN_NODE (id=7): Total=0 Peak=109.00 KB DataStreamSender (dst_id=36): Total=0 Peak=30.91 KB CodeGen: Total=0 Peak=52.50 KB Fragment b84dc213ea94e53d:a98ab7800000103: Reservation=258.00 MB OtherMemory=1.68 MB Total=259.68 MB Peak=259.68 MB SELECT_NODE (id=28): Total=4.00 KB Peak=4.00 KB Exprs: Total=4.00 KB Peak=4.00 KB ANALYTIC_EVAL_NODE (id=27): Total=4.00 KB Peak=4.00 KB Exprs: Total=4.00 KB Peak=4.00 KB ANALYTIC_EVAL_NODE (id=26): Total=4.00 KB Peak=4.00 KB Exprs: Total=4.00 KB Peak=4.00 KB SORT_NODE (id=25): Reservation=258.00 MB OtherMemory=293.67 KB Total=258.29 MB Peak=258.29 MB EXCHANGE_NODE (id=45): Reservation=0 OtherMemory=1.33 MB Total=1.33 MB Peak=10.01 MB DataStreamRecvr: Total=1.35 MB Peak=10.01 MB DataStreamSender (dst_id=47): Total=49.41 KB Peak=49.41 KB CodeGen: Total=3.51 KB Peak=1.03 MB Fragment b84dc213ea94e53d:a98ab78000000e4: Reservation=34.00 MB OtherMemory=2.40 MB Total=36.40 MB Peak=44.01 MB HASH_JOIN_NODE (id=24): Reservation=34.00 MB OtherMemory=355.95 KB Total=34.35 MB Peak=34.35 MB Exprs: Total=43.12 KB Peak=43.12 KB Hash Join Builder (join_node_id=24): Total=39.12 KB Peak=55.12 KB Hash Join Builder (join_node_id=24) Exprs: Total=39.12 KB Peak=39.12 KB EXCHANGE_NODE (id=43): Reservation=0 OtherMemory=1.12 MB Total=1.12 MB Peak=8.75 MB DataStreamRecvr: Total=1.12 MB Peak=8.75 MB EXCHANGE_NODE (id=44): Reservation=0 OtherMemory=0 Total=0 Peak=821.20 KB DataStreamRecvr: Total=0 Peak=821.20 KB DataStreamSender (dst_id=45): Total=669.34 KB Peak=789.34 KB DataStreamSender (dst_id=45) Exprs: Total=8.00 KB Peak=8.00 KB CodeGen: Total=11.46 KB Peak=1.53 MB Fragment b84dc213ea94e53d:a98ab78000000ad: Reservation=1.94 MB OtherMemory=140.59 MB Total=142.53 MB Peak=178.54 MB HASH_JOIN_NODE (id=23): Reservation=1.94 MB OtherMemory=1.12 MB Total=3.05 MB Peak=4.28 MB Exprs: Total=21.12 KB Peak=21.12 KB Hash Join Builder (join_node_id=23): Total=21.12 KB Peak=45.12 KB Hash Join Builder (join_node_id=23) Exprs: Total=21.12 KB Peak=21.12 KB HDFS_SCAN_NODE (id=12): Total=137.83 MB Peak=174.28 MB Exprs: Total=4.00 KB Peak=4.00 KB EXCHANGE_NODE (id=41): Reservation=0 OtherMemory=0 Total=0 Peak=4.02 KB DataStreamRecvr: Total=0 Peak=4.02 KB DataStreamSender (dst_id=43): Total=643.78 KB Peak=971.78 KB DataStreamSender (dst_id=43) Exprs: Total=4.00 KB Peak=4.00 KB CodeGen: Total=12.06 KB Peak=1.73 MB Fragment b84dc213ea94e53d:a98ab7800000098: Reservation=0 OtherMemory=0 Total=0 Peak=3.40 MB HASH_JOIN_NODE (id=18): Reservation=0 OtherMemory=0 Total=0 Peak=2.02 MB Hash Join Builder (join_node_id=18): Total=0 Peak=37.12 KB HDFS_SCAN_NODE (id=17): Total=0 Peak=210.00 KB EXCHANGE_NODE (id=40): Reservation=0 OtherMemory=0 Total=0 Peak=4.02 KB DataStreamRecvr: Total=0 Peak=4.02 KB DataStreamSender (dst_id=41): Total=0 Peak=177.28 KB CodeGen: Total=0 Peak=1.53 MB Fragment b84dc213ea94e53d:a98ab78000000cf: Reservation=0 OtherMemory=0 Total=0 Peak=18.08 MB SELECT_NODE (id=22): Total=0 Peak=528.00 KB ANALYTIC_EVAL_NODE (id=21): Reservation=0 OtherMemory=0 Total=0 Peak=4.53 MB SORT_NODE (id=20): Reservation=0 OtherMemory=0 Total=0 Peak=12.10 MB EXCHANGE_NODE (id=42): Reservation=0 OtherMemory=0 Total=0 Peak=1.37 MB DataStreamRecvr: Total=0 Peak=1.37 MB DataStreamSender (dst_id=44): Total=0 Peak=1.52 MB CodeGen: Total=0 Peak=876.00 KB Fragment b84dc213ea94e53d:a98ab78000000c3: Reservation=0 OtherMemory=0 Total=0 Peak=126.65 KB HDFS_SCAN_NODE (id=19): Total=0 Peak=81.02 KB DataStreamSender (dst_id=42): Total=0 Peak=41.41 KB CodeGen: Total=0 Peak=52.50 KB
Created on ‎10-23-2019 02:27 PM - edited ‎10-23-2019 02:28 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It looks like there was plenty of memory available in the system, that query just hit its individual memory limit.
There were a lot of improvements to avoid out-of-memory between 5.15 and 6.1, particularly for queries with a lot of scans that use a significant amount of memory. It looks like one of the scans was using a large chunk of the query memory:
HDFS_SCAN_NODE (id=0): Total=393.15 MB Peak=393.15 MB
There's one specific regression that I'm aware of that affected Avro scans: https://issues.apache.org/jira/browse/IMPALA-7078. The fix is in 5.15.1 and 5.15.2. I don't know the file format but thought I'd flag that. The IMPALA-7078 fix actually had a few tweaks that would benefit all file formats too.
So I'd suggest:
- Give the queries a bit more memory - in practice we've seen 2GB be a lot better with a wider variety of queries in CDH5.x. 1GB is a bit squeezy for a query with 49 operators.
- Pick up the 5.15.2 or 5.16.2 maintenance releases to get the fix for IMPALA-7078 - that may be enough to solve the problem.
- Look at CDH6.1, it does address a bunch of issues in this area more systematically - it moves the scan operations to use a much more robust memory throttling/reservation system (I spent a bunch of time last year working on problems in this general area).
1 GB might just not be enough to run a query with that many operators on the version of Impala that you're running.
