I have a query, always failing with the following error:
Container exited with a non-zero exit code 1
]], TaskAttempt 2 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) [...]
The query itself is quite a small MERGE (Other much bigger queries work flawlessly):
MERGE INTO summary dst USING (
SELECT
e.id1
, e.id2
, e.id3
, e.name
, e.subject
FROM
mailing e
) src
ON
dst.id1 = src.id1
AND dst.id2 = src.id2
AND dst.id3 = src.id3
WHEN MATCHED
THEN UPDATE SET
name = src.name
, subject=src.subject
The source table has 1.7M rows (50M on disk), the destination has 75M rows, (1.5GB on disk).
Both are ACID tables, ORC.
On the image, map 1 is the one with the issue, and I cannot understand why it has only one task. Naively I would think that more tasks would each have a smaller load and would work better, but I did not manage to do that.
Note that I maxed out already all memory parameters, I cannot do more on those:
yarn-site/yarn.nodemanager.resource.memory-mb = 24064
yarn-site/yarn.scheduler.minimum-allocation-mb = 1024
yarn-site/yarn.scheduler.maximum-allocation-mb = 24064
mapred-site/mapreduce.map.memory.mb = 4096
mapred-site/mapreduce.reduce.memory.mb = 8192
mapred-site/mapreduce.map.java.opts = 3276
mapred-site/mapreduce.reduce.java.opts = 6553
hive-site/hive.tez.container.size = 4096
Is there a way to increase the number of tasks in the mapper, or another way to not get this out of memory error?
