Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive query stalls on hive cli

avatar
Contributor

Hi, after experiencing very slow performance on various VM's I am now using HDP Sandbox on an Azure A6 configuration (4 cores, 28GB memory). I assume this should be enough for reasonable performance. Yet, I got a Java heap OutOfMemoryError while running simple queries based on the Hello World tutorial (e.g. SELECT max(mpg) FROM truck_mileage). I increased the Tez container size to 2048MB, which solves the OutOfMemoryError, however now the query is stalling. What is going wrong / are there any parameters to be set differently? Thanks in advance.

hive> select max(mpg) from truck_mileage;
Query ID = hive_20160123104344_a796f4ba-2fa0-4272-b4e8-42720fd32417
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1453543525470_0004)
--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1                 INITED      1          0        0        1       0       0
Reducer 2             INITED      1          0        0        1       0       0
-------------------------------------------------------------------------------
VERTICES: 00/02  [>>--------------------------] 0%    ELAPSED TIME: 1236.84 s
--------------------------------------------------------------------------------
1 ACCEPTED SOLUTION

avatar
Contributor

I just solved this Azure Sandbox issue based on a comment by Paul Hargis I found on one of the tutorial pages:

Workaround for Hive queries OutOfMemory errors:

Please note that in some cases (such as when running the Hortonworks Sandbox on Microsoft Azure VM and allocating ‘A4’ VM machine), some of the Hive queries will produce OutOfMemory (Java Heap) errors. As a workaround, you can adjust some Hive-Tez config parameters using Ambari console. Go to the Services–>Hive page, click on ‘Configs’ tab, and make the following changes:

1) Scroll down to Optimization section, change Tez Container Size, increasing from 200 to 512 Param: “hive.tez.container.size” Value: 512

2) Click on “Advanced” tab to show extra settings, scroll down to find parameter “hive.tez.java.opts”, and change Hive-Tez Java Opts by increasing Java Heap Max size from 200MB to 512MB: Param: “hive.tez.java.opts” Value: “-server -Xmx512m -Djava.net.preferIPv4Stack=true”

View solution in original post

18 REPLIES 18

avatar
Contributor

Update: I reverted all configs to their default Sandbox state. I tried running the query again. The result is now: failed with java.lang.OutOfMemoryError: Java heap space.

ERROR : Vertex failed, vertexName=Map 1, 
vertexId=vertex_1453566734706_0004_1_00, diagnostics=[Task failed, 
taskId=task_1453566734706_0004_1_00_000000, diagnostics=[TaskAttempt 0 
failed, info=[Error: Failure while running 
task:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap 
space

(On my local VM the query runs without problems, so it is only on the Azure Sandbox)

avatar
Master Mentor

@Michel Meulpolder You may have a valid point. This thread definitely introduce you to new docs and ideas.

avatar
Contributor

Thanks a lot Neeraj.

avatar
Contributor

I just solved this Azure Sandbox issue based on a comment by Paul Hargis I found on one of the tutorial pages:

Workaround for Hive queries OutOfMemory errors:

Please note that in some cases (such as when running the Hortonworks Sandbox on Microsoft Azure VM and allocating ‘A4’ VM machine), some of the Hive queries will produce OutOfMemory (Java Heap) errors. As a workaround, you can adjust some Hive-Tez config parameters using Ambari console. Go to the Services–>Hive page, click on ‘Configs’ tab, and make the following changes:

1) Scroll down to Optimization section, change Tez Container Size, increasing from 200 to 512 Param: “hive.tez.container.size” Value: 512

2) Click on “Advanced” tab to show extra settings, scroll down to find parameter “hive.tez.java.opts”, and change Hive-Tez Java Opts by increasing Java Heap Max size from 200MB to 512MB: Param: “hive.tez.java.opts” Value: “-server -Xmx512m -Djava.net.preferIPv4Stack=true”

avatar
Master Mentor

@Michel Meulpolder Perect! 🙂 Do you have the link handy? Would love to read and upvote.

avatar
Contributor

avatar
New Contributor

Hi, did you actually solve this problem. I have created a number of sandbox environments on a number of different instance sizes and once I introduce an aggregate into a Hive query I get no response. Changing the heap size got rid of the out of memory exception but aggregate queries just hang and stay at 0%.

avatar
Contributor

@Niall Moran

In my case the configuration changes outlined above did solve the problem. I first reverted to the original sandbox configs for all of the components, then committed the changes exactly as suggested by Paul Hargis. (Note: when using Ambari via Internet Explorer, my queries often hanged in the interface, but were processed still in the background. I don't have this problem when using Firefox.)

avatar
New Contributor

Hi, did you actually solve this problem. I have created a number of sandbox environments on a number of different instance sizes and once I introduce an aggregate into a Hive query I get no response. Changing the heap size got rid of the out of memory exception but aggregate queries just hang and stay at 0%.