Support Questions

Find answers, ask questions, and share your expertise

HIVE TEZ Java Heap size

avatar
Rising Star

Hi Team,

Iam getting below error while running hive on TEZ . Please help

ERROR : Status: Failed

2287 ERROR : Vertex failed, vertexName=Map 3, vertexId=vertex_1471822483769_2700_1_02, diagnostics=[Task failed, taskId=task_1471822483769_2700_1_02_000014, diagnostics=[TaskAttempt 0 failed, info=

[Error: Failure while running task:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space

2288 at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)

2289 at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)

2290 at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)

2291 at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)

2292 at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)

2293 at java.security.AccessController.doPrivileged(Native Method)

2294 at javax.security.auth.Subject.doAs(Subject.java:415)

2295 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)

2296 at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)

2297 at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)

2298 at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)

2299 at java.util.concurrent.FutureTask.run(FutureTask.java:262)

2300 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

2301 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

2302 at java.lang.Thread.run(Thread.java:745)

2303 Caused by: java.lang.OutOfMemoryError: Java heap space 2304 at org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.expandAndRehashImpl(BytesBytesMultiHashMap.java:749)2305 at org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.expandAndRehashToTarget(BytesBytesMultiHashMap.java:567)2306 at org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer$HashPartition.getHashMapFromDisk(HybridHashTableContainer.java:150)

2307 at org.apache.hadoop.hive.ql.exec.MapJoinOperator.reloadHashTable(MapJoinOperator.java:592)

2308 at org.apache.hadoop.hive.ql.exec.MapJoinOperator.continueProcess(MapJoinOperator.java:556)

2309 at org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:500)

2310 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:617)

2311 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:631)

2312 at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:344)

2313 at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162)

2315 ], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space

2316 at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)

2317 at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)

2318 at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)

2319 at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)

2320 at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)

2321 at java.security.AccessController.doPrivileged(Native Method)

2322 at javax.security.auth.Subject.doAs(Subject.java:415)

2323 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)

2324 at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)

2325 at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)

2326 at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)

2327 at java.util.concurrent.FutureTask.run(FutureTask.java:262)

2328 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

2329 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

2330 at java.lang.Thread.run(Thread.java:745)

2331 Caused by: java.lang.OutOfMemoryError: Java heap space

2332 at org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.expandAndRehashImpl(BytesBytesMultiHashMap.java:749)

rg.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.expandAndRehashToTarget(BytesBytesMultiHashMap.java:567)

org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer$HashPartition.getHashMapFromDisk(HybridHashTableContainer.java:150)

2335 at org.apache.hadoop.hive.ql.exec.MapJoinOperator.reloadHashTable(MapJoinOperator.java:592)

2336 at org.apache.hadoop.hive.ql.exec.MapJoinOperator.continueProcess(MapJoinOperator.java:556)

2337 at org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:500)

2338 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:617)

2339 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:631)

2 REPLIES 2

avatar

Try increasing the hive.tez.container.size property under hive config. Below is a very good article on how to configure memory setting for tez.

https://community.hortonworks.com/articles/14309/demystify-tez-tuning-step-by-step.html

avatar
Super Guru

@suresh krish

Answer from Santhosh B Gowda could be helpful, but that is brute force with 50-50% chance of luck. You need to understand query execution plan, how much data is processed, how many tasks execute the job. Each task has a container allocated.

You could increase the RAM allocated for the container but if you have a single task performing the map and data is more than the container allocated memory you are still seeing "Out of memory". What you have to do is to understand how much data is processed and how to chunk it for parallelism. Increasing the size of the container is not always needed. It is almost like saying that instead of tuning a bad SQL, let's throw more hardware at it. It is better to have reasonable size containers and have enough of them to process your query data.

For example, let's take a cross-join of a two tables that are small, 1,000,000 records each. The cartesian product will be 1,000,000 x 1,000,000 = 1,000,000,000,000. That is a big size input for a mapper. You need to translate that in GB to understand how much memory is needed. For example, assuming that the memory requirements are 10 GB and tez.grouping.max-size is set to the default 1 GB, 10 mappers will be needed. Those will use 10 containers. Now assume that each container is set to 6 GB each. You will be wasting 60 GB for 10 GB need. In that specific case, it would be actually better to have 1 GB container. Now, if your data is 10 GB and you have only one 6 GB container, that will generate "Out of memory".

If the execution plan of the query has one mapper that means one container is allocated and if that is not big enough, you will get your out of memory error. However, if you reduce tez.grouping.max-size to a lower value that will force the execution plan to have multiple mappers, you will have one container for each and those tasks will work in parallel reducing the time and meeting data requirements. You can override the global tez.grouping.max-size for your specific query.

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_installing_manually_book/content/ref-ffe... describes Tez parameters and some of them could help, however, for your case you could give tez.grouping.max-size a shot.

Summary:

- Understanding data volume that needs to be processed

- EXPLAIN SqlStatement to understand the execution plan - tasks and containers - use ResouceManager UI to see how many containers are used and cluster resources used for this query; Tez View can also give you a good understanding of Mapper and Reducer tasks involved. The more of them the more resources are used, but the response time is better. Balance that to use reasonably resources for a reasonable response time.

- setting tez.grouping.max-size to a value that makes sense for your query; by default is set to 1 GB. That is a global value.