Support Questions

Find answers, ask questions, and share your expertise
Celebrating as our community reaches 100,000 members! Thank you!

How To Process Data with Apache Pig tutorial SLOW


Hello all -

Just a quick LOW PRIORITY question for anyone who has run the tutorial "How To Process Data with Apache Pig".

I created the script, and running the job as I write this. It has been running for 2 hours. Does this seem SLOW to anyone else?

I am running on a machine with an i7 processor, have 16 Gb of RAM, of which the Ambari Sandbox is utilizing 8 Gb. Are there other configuration options that should be set? Although - this seems like a massive amount of resources in use already.


Master Mentor

@Mike Vogt

Have you configured yarn queues?

There is high probability that some other job is consuming all the resources

Check RM ui from ambari

View solution in original post



@Mike Vogt @Lester Martin @Rafael Coss

Hello, I'm facing the same issue but by following the tutorial mentioned in:

Once i execute my pig script, it is stuck in running status as mentioned in status.png.

From RM UI, my application is also stuck in Running status as shown in rm-application.png and i attached the launched job in MapReduce in mr-job.png.

From pig view log, i got hive-log.png.

How can i resolve my issue? I'll be really grateful if you could help me.


From looking at your RM UI it sure looks like both of these jobs are basically fighting each other to get running. Meaning, the AppMaster containers are running, but they can't get anymore more containers to be run from YARN. My recommendation would be to give the VM 10GB of memory (that's how I run it on my 16GB laptop) when you restart it. I'd also try to run it from the command line just to take the Ambari View out of the picture, but if you want to run it in Ambari then kill any application via the RM UI that is around should it hang again. Good luck and happy Hadooping!