- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Hive on Tez query Map output OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
- Labels:
-
Apache Tez
Created 12-10-2015 06:48 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When I use Hive on tez to insert overwrite table from other table,get the follow error,it did not happped every time,sometime query succefully:
"Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1449486079177_5239_1_01, diagnostics=[Task failed, taskId=task_1449486079177_5239_1_01_000018, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task: attempt_1449486079177_5239_1_01_000018_0:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:157) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:348) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:71) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:60) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:60) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:35) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:331) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.(PipelinedSorter.java:173) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.(PipelinedSorter.java:117) at org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.start(OrderedPartitionedKVOutput.java:141) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:141) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147) ... 14 more ], TaskAttempt 1 failed, info=[Error: Failure while running task: attempt_1449486079177_5239_1_01_000018_1:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:157) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:348) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:71) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:60) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:60) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:35) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:331) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.(PipelinedSorter.java:173) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.(PipelinedSorter.java:117) at org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.start(OrderedPartitionedKVOutput.java:141) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:141) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147) ... 14 more ], TaskAttempt 2
Created 12-11-2015 04:42 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Try the following, lets assume your hive.tez.container.size=2048.
set hive.tez.java.opts=-Xmx1640m (0.8 times hive.tez.container.size)
set tez.runtime.io.sort.mb=820 (0.4 times hive.tez.container.size)
set tez.runtime.unordered.output.buffer.size-mb=205 (0.1 times hive.tez.container.size)
Created 12-10-2015 10:46 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If the sum of data sizes is greater than the amount of memory reserved for the hash tables (see below config param), then it happens.
hive.auto.convert.join.noconditionaltask.size=1370MB
Created 12-11-2015 02:36 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It didn't work after set hive.auto.convert.join.noconditionaltask.size=1436549120.After set hive.tez.container.size=2048,set hive.tez.java.opts=-Xmx1700m,the OOM problem is solved.
Created 12-11-2015 04:42 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Try the following, lets assume your hive.tez.container.size=2048.
set hive.tez.java.opts=-Xmx1640m (0.8 times hive.tez.container.size)
set tez.runtime.io.sort.mb=820 (0.4 times hive.tez.container.size)
set tez.runtime.unordered.output.buffer.size-mb=205 (0.1 times hive.tez.container.size)
Created 04-16-2016 02:01 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am running on Azure using Maria_dev login, where could I input this 3 lines of code? In the Tez config, it is all locked and cannot be edited.
Created 03-04-2016 03:28 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1455546410616_13085, Tracking URL = http://ndrm:8088/proxy/application_1455546410616_13085/ Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1455546410616_13085 Hadoop job information for Stage-1: number of mappers: 7; number of reducers: 0 2016-03-03 13:39:54,224 Stage-1 map = 0%, reduce = 0% 2016-03-03 13:40:04,733 Stage-1 map = 57%, reduce = 0%, Cumulative CPU 13.0 sec 2016-03-03 13:40:26,943 Stage-1 map = 86%, reduce = 0%, Cumulative CPU 112.9 sec 2016-03-03 13:40:30,114 Stage-1 map = 96%, reduce = 0%, Cumulative CPU 142.98 sec 2016-03-03 13:40:48,010 Stage-1 map = 86%, reduce = 0%, Cumulative CPU 104.61 sec 2016-03-03 13:41:22,610 Stage-1 map = 96%, reduce = 0%, Cumulative CPU 142.05 sec 2016-03-03 13:41:40,425 Stage-1 map = 86%, reduce = 0%, Cumulative CPU 104.61 sec 2016-03-03 13:42:16,026 Stage-1 map = 96%, reduce = 0%, Cumulative CPU 143.26 sec 2016-03-03 13:42:34,857 Stage-1 map = 86%, reduce = 0%, Cumulative CPU 104.61 sec 2016-03-03 13:43:09,393 Stage-1 map = 96%, reduce = 0%, Cumulative CPU 144.34 sec 2016-03-03 13:43:28,197 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 104.61 sec MapReduce Total cumulative CPU time: 1 minutes 44 seconds 610 msec Ended Job = job_1455546410616_13085 with errors Error during job, obtaining debugging information... Examining task ID: task_1455546410616_13085_m_000003 (and more) from job job_1455546410616_13085 Task with the most failures(4): ----- Task ID: task_1455546410616_13085_m_000000 URL: http://ndrm:8088/taskdetails.jsp?jobid=job_1455546410616_13085&tipid=task_1455546410616_13085_m_0000... ----- Diagnostic Messages for this Task: Error: Java heap space
Increasing the JVM memory and the map memory allocated by the container helped for me .
below are the values used:
hive> set mapreduce.map.memory.mb=4096;
hive >set mapreduce.map.java.opts=-Xmx3600M;
Incase you still get the Java heap error , try increasing to higher values, but make sure that the mapreduce.map.java.opts doesnt exceed mapreduce.map.memory.mb.
well in case of tez you may have to set set hive.tez.java.opts=-Xmx3600M;
Thanks
Created 04-29-2016 12:32 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ssh to your server and open /etc/tez/conf/tez-site.xml and make these changes, if it did not work try larger values:
- tez.am.resource.memory.mb > 768
- tez.task.resource.memory.mb > 768
- tez.am.java.opts: > -Xmx560m -Xms560m
the same for /etc/tez/conf/hive-site.xml
- hive.tez.container.size: -> 768
- hive.tez.java.opts: -> -Xmx560m -Xms560m
Then run
$> su hive
$> hive
and run your query.
Created 02-15-2017 10:34 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, your solution worked for me - but there's a minor typo, I think you mean /etc/hive/conf/hive-site.xml for the second file, not /etc/tez/conf/hive-site.xml.
Created 03-20-2017 12:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Worked supper! Thank U.
Created 02-08-2023 07:05 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm running into the same situation:
Error: Error while running task ( failure ) : java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.allocateSpace(PipelinedSorter.java:250)
at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.end(PipelinedSorter.java:1054)
at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.next(PipelinedSorter.java:1009)
at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.sort(PipelinedSorter.java:318)
at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:423)
at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.write(PipelinedSorter.java:379)
at org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput$1.write(OrderedPartitionedKVOutput.java:167)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor$TezKVOutputCollector.collect(TezProcessor.java:355)
at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:511)
at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:367)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:939)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1050)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.flushHashTable(GroupByOperator.java:998)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:750)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:939)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:939)
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:825)
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:857)
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:941)
at org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:590)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:939)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:939)
at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:126)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:939)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:128)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:153)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:555)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92)
I'm not sure why do we need to manually set container or buffer size? Shouldn't Tez do the calculations and use only what's available?