Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive OOM error when running a query with tez as execution engine

Hive OOM error when running a query with tez as execution engine

Contributor

when I am running the following query, I am getting the error below. It is launching around 2 mapper tasks and 240 reducer tasks and the job is going smooth and fast upto 239 tasks, it is taking 3 hrs to do do 4 task attempts and then the job is failing.

insert overwrite table mybug select row_number() over (order by clickstream_key) as key, clickstream_key as data from mytest;

Table mytest contains 2 billion records with one column "clickstream_key".

2016-07-24 09:54:37,918 [ERROR] [TezChild] |tez.TezProcessor|: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3181)
at java.util.ArrayList.grow(ArrayList.java:261)
at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:235)
at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:227)
at java.util.ArrayList.add(ArrayList.java:458)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFRowNumber$RowNumberBuffer.incr(GenericUDAFRowNumber.java:73)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFRowNumber$GenericUDAFRowNumberEvaluator.iterate(GenericUDAFRowNumber.java:102)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:185)
at org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.evaluateWindowFunction(WindowingTableFunction.java:164)
at org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.iterator(WindowingTableFunction.java:580)
at org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:340)
at org.apache.hadoop.hive.ql.exec.PTFOperator.closeOp(PTFOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:617)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:631)
at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:290)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:164)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2016-07-24 09:54:37,920 [INFO] [TezChild] |task.TezTaskRunner|: Encounted an error while executing task: attempt_1469365079244_0001_1_01_000000_0
java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3181)
at java.util.ArrayList.grow(ArrayList.java:261)
at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:235)
at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:227)
at java.util.ArrayList.add(ArrayList.java:458)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFRowNumber$RowNumberBuffer.incr(GenericUDAFRowNumber.java:73)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFRowNumber$GenericUDAFRowNumberEvaluator.iterate(GenericUDAFRowNumber.java:102)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:185)
at org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.evaluateWindowFunction(WindowingTableFunction.java:164)
at org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.iterator(WindowingTableFunction.java:580)
at org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:340)
at org.apache.hadoop.hive.ql.exec.PTFOperator.closeOp(PTFOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:617)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:631)
at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:290)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:164)
... 14 more
2016-07-24 09:54:37,947 [INFO] [TezChild] |runtime.LogicalIOProcessorRuntimeTask|: Final Counters for attempt_1469365079244_0001_1_01_000000_0: Counters: 71 [[File System Counters FILE_BYTES_READ=38467196060, FILE_BYTES_WRITTEN=36640725116, FILE_READ_OPS=0, FILE_LARGE_READ_OPS=0, FILE_WRITE_OPS=0, HDFS_BYTES_READ=0, HDFS_BYTES_WRITTEN=0, HDFS_READ_OPS=0, HDFS_LARGE_READ_OPS=0, HDFS_WRITE_OPS=0][org.apache.tez.common.counters.TaskCounter REDUCE_INPUT_GROUPS=1, REDUCE_INPUT_RECORDS=2000000000, COMBINE_INPUT_RECORDS=0, SPILLED_RECORDS=2000000000, NUM_SHUFFLED_INPUTS=1, NUM_SKIPPED_INPUTS=0, NUM_FAILED_SHUFFLE_INPUTS=0, MERGED_MAP_OUTPUTS=1, GC_TIME_MILLIS=155973, CPU_MILLISECONDS=4116570, PHYSICAL_MEMORY_BYTES=51539607552, VIRTUAL_MEMORY_BYTES=55293632512, COMMITTED_HEAP_BYTES=51539607552, OUTPUT_RECORDS=0, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILLS_BYTES_READ=93928296, SHUFFLE_BYTES=93928296, SHUFFLE_BYTES_DECOMPRESSED=2000000020, SHUFFLE_BYTES_TO_MEM=0, SHUFFLE_BYTES_TO_DISK=0, SHUFFLE_BYTES_DISK_DIRECT=93928296, NUM_MEM_TO_DISK_MERGES=0, NUM_DISK_TO_DISK_MERGES=0, SHUFFLE_PHASE_TIME=41, MERGE_PHASE_TIME=46, FIRST_EVENT_RECEIVED=19, LAST_EVENT_RECEIVED=19][Shuffle Errors BAD_ID=0, CONNECTION=0, IO_ERROR=0, WRONG_LENGTH=0, WRONG_MAP=0, WRONG_REDUCE=0][Shuffle Errors_Reducer_2_INPUT_Map_1 BAD_ID=0, CONNECTION=0, IO_ERROR=0, WRONG_LENGTH=0, WRONG_MAP=0, WRONG_REDUCE=0][TaskCounter_Reducer_2_INPUT_Map_1 ADDITIONAL_SPILLS_BYTES_READ=93928296, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, COMBINE_INPUT_RECORDS=0, FIRST_EVENT_RECEIVED=19, LAST_EVENT_RECEIVED=19, MERGED_MAP_OUTPUTS=1, MERGE_PHASE_TIME=46, NUM_DISK_TO_DISK_MERGES=0, NUM_FAILED_SHUFFLE_INPUTS=0, NUM_MEM_TO_DISK_MERGES=0, NUM_SHUFFLED_INPUTS=1, NUM_SKIPPED_INPUTS=0, REDUCE_INPUT_GROUPS=1, REDUCE_INPUT_RECORDS=2000000000, SHUFFLE_BYTES=93928296, SHUFFLE_BYTES_DECOMPRESSED=2000000020, SHUFFLE_BYTES_DISK_DIRECT=93928296, SHUFFLE_BYTES_TO_DISK=0, SHUFFLE_BYTES_TO_MEM=0, SHUFFLE_PHASE_TIME=41, SPILLED_RECORDS=2000000000][TaskCounter_Reducer_2_OUTPUT_out_Reducer_2 OUTPUT_RECORDS=0]]
2016-07-24 09:54:37,947 [INFO] [TezChild] |runtime.LogicalIOProcessorRuntimeTask|: Joining on EventRouter
2016-07-24 09:54:37,948 [INFO] [TezChild] |runtime.LogicalIOProcessorRuntimeTask|: Closed processor for vertex=Reducer 2, index=1
2016-07-24 09:54:37,948 [INFO] [TezChild] |orderedgrouped.Shuffle|: Shutting down Shuffle for source: Map_1
2016-07-24 09:54:37,949 [INFO] [TezChild] |orderedgrouped.ShuffleInputEventHandlerOrderedGrouped|: Map 1: numDmeEventsSeen=1, numDmeEventsSeenWithNoData=0, numObsoletionEventsSeen=0, updateOnClose
2016-07-24 09:54:37,949 [INFO] [TezChild] |orderedgrouped.MergeManager|: finalMerge called with 0 in-memory map-outputs and 0 on-disk map-outputs
2016-07-24 09:54:37,949 [INFO] [TezChild] |orderedgrouped.MergeManager|: Merging 0 files, 0 bytes from disk
2016-07-24 09:54:37,949 [INFO] [TezChild] |orderedgrouped.MergeManager|: Merging 0 segments, 0 bytes from memory into reduce
2016-07-24 09:54:37,949 [INFO] [TezChild] |impl.TezMerger|: Merging 0 sorted segments
2016-07-24 09:54:37,949 [INFO] [TezChild] |impl.TezMerger|: Nothing to merge. Returning an empty iterator
2016-07-24 09:54:37,950 [INFO] [TezChild] |runtime.LogicalIOProcessorRuntimeTask|: Closed input for vertex=Reducer 2, sourceVertex=Map 1
2016-07-24 09:54:37,950 [INFO] [TezChild] |output.MROutput|: out_Reducer 2 closed
2016-07-24 09:54:37,950 [INFO] [TezChild] |runtime.LogicalIOProcessorRuntimeTask|: Closed input for vertex=Reducer 2, sourceVertex=out_Reducer 2
2016-07-24 09:54:37,951 [INFO] [main] |task.TezChild|: Shutdown invoked for container container_e42_1469365079244_0001_01_000002
2016-07-24 09:54:37,951 [INFO] [main] |task.TezChild|: Shutting down container container_e42_1469365079244_0001_01_000002

I tried changing data node heap, reducer heap, mapper heap, hive.tez.java.opts but none of them worked. Any leads on this would be highly appreciated. The error says java heap space, which java heap space does it mean ?

6 REPLIES 6

Re: Hive OOM error when running a query with tez as execution engine

@vinay kumar

set hive.auto.convert.join=false and give a try

Re: Hive OOM error when running a query with tez as execution engine

Rising Star
@vinay kumar

This requires some detailed analysis.

Any ways, you can try to increase the following:

tez.am.resource.memory.mb

tez.am.java.opts

These are for the "tez" client config properties.

Re: Hive OOM error when running a query with tez as execution engine

Contributor

@Ravi

Have you ever seen this case where the last reducer consume a lot of time to complete the task ?

Highlighted

Re: Hive OOM error when running a query with tez as execution engine

Contributor

@Neeraj Sabharwal Could you help me with this ?

Re: Hive OOM error when running a query with tez as execution engine

Rising Star

@vinay kumar

Yes. There could be many reasons for it. It is possible that the data is skewed and the query is taking longer time there.

You can try to check the query "explain" and "explain formatted" for further understanding.

Re: Hive OOM error when running a query with tez as execution engine

@vinay kumar

Please try by setting below properties in your hive/beeline session and run the query which might help you.

set hive.execution.engine=tez;
set hive.auto.convert.join=true;
set hive.auto.convert.join.noconditionaltask=true;
set hive.auto.convert.join.noconditionaltask.size=405306368;
set hive.vectorized.execution.enabled=true;
set hive.vectorized.execution.reduce.enabled = true;
set hive.cbo.enable=true;
set hive.compute.query.using.stats=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
set hive.merge.mapfiles = true; 
set hive.merge.mapredfiles=true; 
set hive.merge.size.per.task=134217728; 
set hive.merge.smallfiles.avgsize= 44739242; 
set mapreduce.job.reduce.slowstart.completedmaps=0.7
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.enforce.bucketing=true;
set hive.exec.max.dynamic.partitions.pernode=100000;
set hive.exec.max.created.files = 1000000;
set hive.exec.parallel = true;

Link http://hortonworks.com/blog/5-ways-make-hive-queries-run-faster