Created 09-20-2016 10:57 AM
I have a pig job that has 6 joins(5 small tables and 1 large table ) in it . The number of Map jobs spawned for the job are 49 and number of reducers is 13 .
The Job is running more than 12 hrs .
Is there any formula to set the below properties
set default_parallel set mapred.max.split.size set mapred.min.split.size set mapred.task.timeout set mapred.task.ping.timeout set mapred.map.child.java.opts -Xmx4096m; set mapred.reduce.child.java.opts -Xmx4096m; set pig.exec.reducers.bytes.per.reducer
i got the above leads for making in faster ..
However i am not able to calculate the exact figures to do it .
Created 09-20-2016 12:37 PM
You need to take three approaches:
For 1, see: https://pig.apache.org/docs/r0.7.0/cookbook.html
For 1 and 2, see: https://pig.apache.org/docs/r0.9.1/perf.html
After performing these optimizations, for 3 see:
Also, be sure you are running pig on Tez.
Created 09-20-2016 12:37 PM
You need to take three approaches:
For 1, see: https://pig.apache.org/docs/r0.7.0/cookbook.html
For 1 and 2, see: https://pig.apache.org/docs/r0.9.1/perf.html
After performing these optimizations, for 3 see:
Also, be sure you are running pig on Tez.
Created 09-23-2016 06:08 AM
thank you however ,Is there any way to calculate the appropriate number of reducers for a particular operation.
I observed that increasing the number of reduces might also bring down the performance . in some cases .
Created 09-23-2016 12:01 PM
This is a good discussion on setting reducers: https://community.hortonworks.com/questions/28073/how-do-you-force-the-number-of-reducers-in-a-map-r...
As with all performance tuning, best to isolate a bottleneck and tune that vs. simply trying a lot of things at the same time. So yes, among other tuning ... set this and see if it works. If not, move to the next suspected bottleneck.