Support Questions

mvogt · ‎02-07-2016

Hello all -

Just a quick LOW PRIORITY question for anyone who has run the tutorial "How To Process Data with Apache Pig".

I created the script, and running the job as I write this. It has been running for 2 hours. Does this seem SLOW to anyone else?

I am running on a machine with an i7 processor, have 16 Gb of RAM, of which the Ambari Sandbox is utilizing 8 Gb. Are there other configuration options that should be set? Although - this seems like a massive amount of resources in use already.

nsabharwal · ‎02-07-2016

@Mike Vogt

Have you configured yarn queues?

There is high probability that some other job is consuming all the resources

Check RM ui from ambari

View solution in original post

nsabharwal · ‎02-07-2016

@Mike Vogt

Have you configured yarn queues?

There is high probability that some other job is consuming all the resources

Check RM ui from ambari

nsabharwal · ‎02-07-2016

@Mike Vogt

Make sure core components are up

Hdfs

Yarn

Mapreduce

LesterMartin · ‎02-09-2016

Yep, my History Server was down and had to be manually started.

nsabharwal · ‎02-09-2016

@Lester Martin Thanks for testing and confirming. I think you should publish article based on your comments

LesterMartin · ‎02-09-2016

I'm working with @Rafael Coss to make sure the instructions are extremely crisp as I think there are a few things that could easily trip up a novice which is who we are targeting with these tutorials.

mvogt · ‎02-09-2016

Your genius level skills shine through once again! Thanks very much!

LesterMartin · ‎02-09-2016

I just ran this tutorial on my 16GB i7 MBPro (gave the VM 8GB just as you) and could get it to run in 100 secs with MR and about 65 secs using Tez. I then ran the same script from the CLI and got those times down to about 60 and 25 secs on MR and Tez, respectively. I'm using the 2.3.2 Sandbox and the only thing I had to do was start the History Server was showing up red in Ambari.

aervits · ‎02-09-2016

Tez benefits from warm containers so consecutive execution of same scripts should be better. Didn't know MR was performing better in CLI, can't explain that 🙂 @Lester Martin

LesterMartin · ‎02-09-2016

It ~seems~ that the Ambari Views were adding about 30 seconds to the run times. Here's some of my notes around timings; notice the actual log-reported job times are pretty consistent from CLI and View runs.

Ran From	Exec Eng	Job Time	Clock Time
Ambari View	MR	64 sec	103 sec
Ambari View	Tez	25 sec	63 sec
CLI	MR	59 sec	61 sec
CLI	Tez	25 sec	27 sec

Actual job times were consistent for each execution engine (Tez twice as fast), but Ambari View ~seemed~ to add 30+ secs overall. I'm sure my the extremely constrained HDP stack on a tiny little psuedo-cluster (aka the Sandbox) is a big factor in this (understandable).

Cloudera Community

Support Questions

How To Process Data with Apache Pig tutorial SLOW