Created 02-07-2016 08:49 PM
Hello all -
Just a quick LOW PRIORITY question for anyone who has run the tutorial "How To Process Data with Apache Pig".
I created the script, and running the job as I write this. It has been running for 2 hours. Does this seem SLOW to anyone else?
I am running on a machine with an i7 processor, have 16 Gb of RAM, of which the Ambari Sandbox is utilizing 8 Gb. Are there other configuration options that should be set? Although - this seems like a massive amount of resources in use already.
Created 02-07-2016 08:51 PM
Have you configured yarn queues?
There is high probability that some other job is consuming all the resources
Check RM ui from ambari
Created 02-07-2016 08:51 PM
Have you configured yarn queues?
There is high probability that some other job is consuming all the resources
Check RM ui from ambari
Created 02-07-2016 08:54 PM
Created 02-09-2016 10:41 PM
Yep, my History Server was down and had to be manually started.
Created 02-09-2016 10:46 PM
@Lester Martin Thanks for testing and confirming. I think you should publish article based on your comments
Created 02-09-2016 10:48 PM
I'm working with @Rafael Coss to make sure the instructions are extremely crisp as I think there are a few things that could easily trip up a novice which is who we are targeting with these tutorials.
Created 02-09-2016 01:29 AM
Your genius level skills shine through once again! Thanks very much!
Created 02-09-2016 10:26 PM
I just ran this tutorial on my 16GB i7 MBPro (gave the VM 8GB just as you) and could get it to run in 100 secs with MR and about 65 secs using Tez. I then ran the same script from the CLI and got those times down to about 60 and 25 secs on MR and Tez, respectively. I'm using the 2.3.2 Sandbox and the only thing I had to do was start the History Server was showing up red in Ambari.
Created 02-09-2016 10:28 PM
Tez benefits from warm containers so consecutive execution of same scripts should be better. Didn't know MR was performing better in CLI, can't explain that 🙂 @Lester Martin
Created 02-09-2016 10:39 PM
It ~seems~ that the Ambari Views were adding about 30 seconds to the run times. Here's some of my notes around timings; notice the actual log-reported job times are pretty consistent from CLI and View runs.
Ran From | Exec Eng | Job Time | Clock Time |
Ambari View | MR | 64 sec | 103 sec |
Ambari View | Tez | 25 sec | 63 sec |
CLI | MR | 59 sec | 61 sec |
CLI | Tez | 25 sec | 27 sec |
Actual job times were consistent for each execution engine (Tez twice as fast), but Ambari View ~seemed~ to add 30+ secs overall. I'm sure my the extremely constrained HDP stack on a tiny little psuedo-cluster (aka the Sandbox) is a big factor in this (understandable).