I have a query, always failing with the following error:
Container exited with a non-zero exit code 1 ]], TaskAttempt 2 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) [...]
The query itself is quite a small MERGE (Other much bigger queries work flawlessly):
MERGE INTO summary dst USING ( SELECT e.id1 , e.id2 , e.id3 , e.name , e.subject FROM mailing e ) src ON dst.id1 = src.id1 AND dst.id2 = src.id2 AND dst.id3 = src.id3 WHEN MATCHED THEN UPDATE SET name = src.name , subject=src.subject
The source table has 1.7M rows (50M on disk), the destination has 75M rows, (1.5GB on disk).
Both are ACID tables, ORC.
On the image, map 1 is the one with the issue, and I cannot understand why it has only one task. Naively I would think that more tasks would each have a smaller load and would work better, but I did not manage to do that.
Note that I maxed out already all memory parameters, I cannot do more on those:
yarn-site/yarn.nodemanager.resource.memory-mb = 24064 yarn-site/yarn.scheduler.minimum-allocation-mb = 1024 yarn-site/yarn.scheduler.maximum-allocation-mb = 24064 mapred-site/mapreduce.map.memory.mb = 4096 mapred-site/mapreduce.reduce.memory.mb = 8192 mapred-site/mapreduce.map.java.opts = 3276 mapred-site/mapreduce.reduce.java.opts = 6553 hive-site/hive.tez.container.size = 4096
Is there a way to increase the number of tasks in the mapper, or another way to not get this out of memory error?
Heap size should be 80% of tez container size and io sort mb should be 40% . Can you verify below configurations :
Can you try disabling map side join before executing the query - set hive.auto.convert.join=false;