Support Questions

Find answers, ask questions, and share your expertise

HDP Sandbox tutorial not working when running Hive query: Throws OutOfMemoryError: Java heap space

Explorer

Hello,

I am running the HDP 2.5 Sandbox on Docker and following the Hello World tutorial step-by-step. Everything worked up until I tried running the following Hive query from the Ambari Hive View (section 2.4.3 of the tutorial):

SELECT truckid, avg(mpg) avgmpg FROM truck_mileage GROUP BY truckid;

Problem: After several seconds the query fails with the following exception:

    java.lang.Exception: java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1490094767913_0001_1_00, diagnostics=[Task failed, taskId=task_1490094767913_0001_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:159)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.(PipelinedSorter.java:173)
at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.(PipelinedSorter.java:117)
at org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.start(OrderedPartitionedKVOutput.java:142)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:138)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)
... 14 more

Here are my system specs:

CPU: Quad Core Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz Memory: 64 GB Linux: 4.4.0-53-generic #74~14.04.1-Ubuntu SMP x86_64 x86_64 x86_64 GNU/Linux

I did not change ANY settings (in the tutorial it says to set Tez as the execution engine, which was already set by default).

Can you please tell me how to fix this, otherwise the Sandbox is not usable at all.

Kind Regards Benjamin

9 REPLIES 9

Expert Contributor

@Benjamin Wiese

Can you try:

analyze table truck_mileage compute statistics;

analyze table truck_mileage compute statistics for columns;

to check if the access plans change, and the query returns.

Do SQL Queries work in general, or just this statement.

There are a number of heap tuning parameters to review also.

Is the Sandbox using all of its resources?

Explorer

Hi Graham,

thanks for your reply.

Generally SQL queries work. However, of the two queries you posted only the first one completed (but empty result). The second one (analyze table truck_mileage compute statistics for columns;) gave me the same java heap space exception.

And no, the sandbox is not using its maximum resources, not even close.

I tried playing around with the container sizes, but that lead to the queries not returning at all anymore (i.e. no result, no crash, just keep running).

Expert Contributor

Sounds like something fundamental is broken. How big is the table, if it returns....

select count(*) from truck_mileage;

Anything showing up in /var/log/hive and similar directories.

Hi @Benjamin Wiese

The settings for Hive in HDP 2.5 Sandbox are on the low end by default. Please modify them.

1) In Ambari, select "Hive" from the left menu then the "Configs" tab and the "Settings" sub-tab

13913-screen-shot-2017-03-22-at-113831-am.png

2) Scroll down to the bottom of the page and modify the "HeveServer2 Heap Size", "Metastore Heap Size" as well as any other flagged items (possibly "Memory for Map Join"). If you hover next to each item, Ambari will make recommendations for the values to set, so feel free to use those by selecting the "set recommended" icon that appears.

13914-screen-shot-2017-03-22-at-114511-am.png

3) Save and click "restart affected" services near the top of the page.

Explorer

@Benjamin Wiese You may want to increase the "Maximum Container Size" value from Tez on Ambari UI.

Explorer

Explorer

Thank you all for your replies. I did improve the memory settings (according to https://community.hortonworks.com/articles/14309/demystify-tez-tuning-step-by-step.html). However, it is not possible to restart all affected services after a parameter change. E.g. the Hive Metastore service always aborts its restart. Stopping and restarting the container does not help either.

Explorer

@Benjamin Wiese

Have you solved this problem? I ran into exactly the same thing on HDP Sandbox 2.6

Explorer

@George Meltser

Unfortunately not. I have given up and I am now building the Hadoop stack manually without Ambari.