Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Pig script runs fine on Sandbox but fails on a real cluster

Highlighted

Pig script runs fine on Sandbox but fails on a real cluster

Rising Star

We are facing a tricky situation. We run Pig script from Hadoop tutorial. It works fine on a Sandbox. But fails in real cluster where it complains about insufficient memory for the container.

container is running beyond physical memory limit

message can be seen in the logs.

The tricky part is - Sandbox has way less memory available than real cluster (about 3 times less). Also most memory settings in Sandbox (MapReduce memory, Yarn memory, Yarn container sizes) allow much less memory than corresponding settings in a real cluster. Still it is sufficient for Pig in Sandbox but not sufficient in a real cluster.

Another note - Hive queries doing the similar job also work good, they do not complain about memory.

Apparently there is some setting somewhere, which makes Pig to request too much memory? Can please anybody recommend what parameter should be modified to stop Pig script to request too big memory?

5 REPLIES 5
Highlighted

Re: Pig script runs fine on Sandbox but fails on a real cluster

Super Guru

are you running from command line, interactive or Ambari View?

Also this is just with default settings?

Does the data exist?

Any logs? other error messages? can you share any more details.

Is the real cluster HDP 2.5?

what servers?

Highlighted

Re: Pig script runs fine on Sandbox but fails on a real cluster

Expert Contributor

@Dmitry Otblesk

That challenge we have in the HCC is usually that we don't get the entire logs and very less input to work with :). Your logs should have had stack as below:

Container [pid=2617,containerID=container_1438923434512_12103_01_000002] is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory used; 2.9 GB of 2.1 GB virtual memory used. Killing container. Dump of the process-tree for container_1438923434512_12103_01_000002..

This will indicate what limit is being set and whats the threshold where its getting the error . As a workaround to test this out from the grunt shell , you can set the following and then test again :

set mapreduce.map.java.opts '-Xmx1024m'
set mapreduce.reduce.java.opts '-Xmx1024m'
set mapreduce.map.memory.mb '1536'
set mapreduce.reduce.memory.mb '1536'
Highlighted

Re: Pig script runs fine on Sandbox but fails on a real cluster

Rising Star

@Sumesh

I attached the most relevant (I think) part of the log. You were right in assuming that it was going beyond of the limits of the container.

If I follow your suggestion and increase some memory parameters it may start to work, but then other processes will be suffering because of lack of memory.

I was looking for a different solution though. I wanted to know how you guys at Hortonworks made Sandbox to work perfectly with much smaller memory available? Below are Sandbox's values for the parameters you mentioned in your post

mapreduce.map.java.opts     -Xmx200m
mapreduce.reduce.java.opts  -Xmx200m
mapreduce.map.memory.mb     250
mapreduce.reduce.memory.mb  250

As you can see the values are way smaller than those you've suggested.

So how does Sandbox's Pig work with these parameters? Why it is not failing and not complaining about low memory? What is doing the trick in Sandbox?

Re: Pig script runs fine on Sandbox but fails on a real cluster

Rising Star

@Timothy Spann

>are you running from command line, interactive or Ambari View?

Running from Pig View in Ambari

>Also this is just with default settings?

No, some settings were modified (towards reducing of required memory)

>Does the data exist?

Of course it does

>Any logs? other error messages? can you share any more details.

I provide them below in response to Sumesh question

>Is the real cluster HDP 2.5?

Yes it is

>what servers?

12 Gb RAM, 1 Tb hard drive

Highlighted

Re: Pig script runs fine on Sandbox but fails on a real cluster

Super Guru

run it from the command line.

Don't have an account?
Coming from Hortonworks? Activate your account here