Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

What is the workaround when getting Hive OutOfMemory errors?

avatar

Some users are getting OutOfMemory errors when running the "Getting Started with HDP" tutorial on Hortonworks website:

http://hortonworks.com/hadoop-tutorial/hello-world-an-introduction-to-hadoop-hcatalog-hive-and-pig/#...

What is the suggested workaround, especially when running in a limited memory environment like the Sandbox?

1 ACCEPTED SOLUTION

avatar

Workaround for Hive queries OutOfMemory errors:

Please note that in some cases (such as when running the Hortonworks Sandbox on Microsoft Azure VM and allocating ‘A4’ VM machine), some of the Hive queries will produce OutOfMemory (Java Heap) errors. As a workaround, you can adjust some Hive-Tez config parameters using Ambari console. Go to the Services–>Hive page, click on ‘Configs’ tab, and make the following changes:

1) Scroll down to Optimization section, change Tez Container Size, increasing from 200 to 512 Param: “hive.tez.container.size” Value: 512

2) Click on “Advanced” tab to show extra settings, scroll down to find parameter “hive.tez.java.opts”, and change Hive-Tez Java Opts by increasing Java Heap Max size from 200MB to 512MB: Param: “hive.tez.java.opts” Value: “-server -Xmx512m -Djava.net.preferIPv4Stack=true”

View solution in original post

5 REPLIES 5

avatar

Workaround for Hive queries OutOfMemory errors:

Please note that in some cases (such as when running the Hortonworks Sandbox on Microsoft Azure VM and allocating ‘A4’ VM machine), some of the Hive queries will produce OutOfMemory (Java Heap) errors. As a workaround, you can adjust some Hive-Tez config parameters using Ambari console. Go to the Services–>Hive page, click on ‘Configs’ tab, and make the following changes:

1) Scroll down to Optimization section, change Tez Container Size, increasing from 200 to 512 Param: “hive.tez.container.size” Value: 512

2) Click on “Advanced” tab to show extra settings, scroll down to find parameter “hive.tez.java.opts”, and change Hive-Tez Java Opts by increasing Java Heap Max size from 200MB to 512MB: Param: “hive.tez.java.opts” Value: “-server -Xmx512m -Djava.net.preferIPv4Stack=true”

avatar
New Contributor

It worked great. Thanks

avatar

No, this did not work. Changed the two parameters as indicated but when saving the configuration was told that the "hive.autoconvert.join.noconditionaltask.size" parameter should be 69,905,066 instead of 3,149,642,683. Upon agreeing to this reset, got 3 additional warnings, one of which said that the noconditionaltask.size parameter was now "less than the recommended default of 286,331,153." And the tutorial still blew up at the same point.

Point is, each change I've made has generated numerous warnings regarding other settings - some of which contradict settings from previous warnings. And the tutorial still blows up. So I find myself knee deep in Hive internals - none of which I am familiar with - when all I'm really trying to do at this point is step through the tutorial. [In addition I now have numerous Hive configurations which I have not figured out how to discard.]

While there is value in wrestling with all these parameters, it seems a bit much to ask of us newbies. In this case, the first fail occured while in the section on Analyze The Truck Data; the script to create the truck_mileage table blows up. Because of this, several of the following tutorial scripts will also fail because they want to process that table. So much of the "hands-on" portion of the tutorial from this point on is also going to fail.

Suggestions? Thanks...Terry M.

avatar

This problem was noted while running HDP 2.5. Apparently this problem was fixed in version 2.6. However, this required also updating the Oracle VM VirtualBox to version 5.1.30. This problem has not reappeared.

avatar
Expert Contributor

Workarounds are specific to the actual problem, not a symptom like running out of memory. There are parts of the system which do use a lot of memory - the usual set of workarounds is to disable memory hungry features like map-joins, map-side hash aggregations. Alternatively, there are a few scalability features which reduce total memory required, but are disabled since they degrade performance on large RAM clusters (like dynamic partitioned insert optimizations). There are configuration issues which go unnoticed, like allocating 60% of a container as a single sort buffer. At the very least, I ask for people to submit a jmap hprof or a jmap -histo to be able to diagnose these.