Support Questions

Find answers, ask questions, and share your expertise

Fine tune the PIg Job

avatar
Explorer

I have a pig job that has 6 joins(5 small tables and 1 large table ) in it . The number of Map jobs spawned for the job are 49 and number of reducers is 13 .

The Job is running more than 12 hrs .

Is there any formula to set the below properties

set default_parallel set mapred.max.split.size set mapred.min.split.size set mapred.task.timeout set mapred.task.ping.timeout set mapred.map.child.java.opts -Xmx4096m; set mapred.reduce.child.java.opts -Xmx4096m; set pig.exec.reducers.bytes.per.reducer

i got the above leads for making in faster ..

However i am not able to calculate the exact figures to do it .

1 ACCEPTED SOLUTION

avatar
Guru

You need to take three approaches:

  1. minimize your data before join (e.g. load only columns needed for join and output, filter before join), then
  2. optimize your joins, then
  3. optimize settings (including compressing intermediate results)

For 1, see: https://pig.apache.org/docs/r0.7.0/cookbook.html

For 1 and 2, see: https://pig.apache.org/docs/r0.9.1/perf.html

After performing these optimizations, for 3 see:

Also, be sure you are running pig on Tez.

View solution in original post

3 REPLIES 3

avatar
Guru

You need to take three approaches:

  1. minimize your data before join (e.g. load only columns needed for join and output, filter before join), then
  2. optimize your joins, then
  3. optimize settings (including compressing intermediate results)

For 1, see: https://pig.apache.org/docs/r0.7.0/cookbook.html

For 1 and 2, see: https://pig.apache.org/docs/r0.9.1/perf.html

After performing these optimizations, for 3 see:

Also, be sure you are running pig on Tez.

avatar
Explorer

thank you however ,Is there any way to calculate the appropriate number of reducers for a particular operation.

I observed that increasing the number of reduces might also bring down the performance . in some cases .

avatar
Guru

This is a good discussion on setting reducers: https://community.hortonworks.com/questions/28073/how-do-you-force-the-number-of-reducers-in-a-map-r...

As with all performance tuning, best to isolate a bottleneck and tune that vs. simply trying a lot of things at the same time. So yes, among other tuning ... set this and see if it works. If not, move to the next suspected bottleneck.