Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

java.lang.OutOfMemoryError: in tez

java.lang.OutOfMemoryError: in tez

Expert Contributor

I have a query, always failing with the following error:

Container exited with a non-zero exit code 1 ]], TaskAttempt 2 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) [...]

The query itself is quite a small MERGE (Other much bigger queries work flawlessly):

MERGE INTO summary dst USING (
SELECT
     e.id1
   , e.id2
   , e.id3
   , e.name
   , e.subject
  FROM
    mailing e
) src
ON
        dst.id1 = src.id1
    AND dst.id2 = src.id2
    AND dst.id3 = src.id3
WHEN MATCHED
THEN UPDATE SET
    name = src.name
   , subject=src.subject

The source table has 1.7M rows (50M on disk), the destination has 75M rows, (1.5GB on disk).

Both are ACID tables, ORC.

On the image, map 1 is the one with the issue, and I cannot understand why it has only one task. Naively I would think that more tasks would each have a smaller load and would work better, but I did not manage to do that.

Note that I maxed out already all memory parameters, I cannot do more on those:

yarn-site/yarn.nodemanager.resource.memory-mb = 24064
yarn-site/yarn.scheduler.minimum-allocation-mb = 1024
yarn-site/yarn.scheduler.maximum-allocation-mb = 24064
mapred-site/mapreduce.map.memory.mb = 4096
mapred-site/mapreduce.reduce.memory.mb = 8192
mapred-site/mapreduce.map.java.opts = 3276
mapred-site/mapreduce.reduce.java.opts = 6553
hive-site/hive.tez.container.size = 4096

Is there a way to increase the number of tasks in the mapper, or another way to not get this out of memory error?

62786-graphicalview.png

1 REPLY 1
Highlighted

Re: java.lang.OutOfMemoryError: in tez

Contributor
@Guillaume Roger

Heap size should be 80% of tez container size and io sort mb should be 40% . Can you verify below configurations :

set hive.tez.java.opts=-Xmx3276m;

set tez.runtime.io.sort.mb=1638;

Can you try disabling map side join before executing the query - set hive.auto.convert.join=false;