In one of my jobs, one of the reducer tasks is taking longer time that has impacted the whole job execution time.
When I checked the reducer logs, I found that at shuffling phase it took 4 hours to fetch the data from one mapper task and keep it in the disk. (As attached). Could any one help me in identifying the property which defines the path where the reducer the keeps the data fetched from map task.
1. Mapper output, reduce shuffle/short(If data is less it will be inmemory) output is saved on to localfile system .
2. mapreduce.cluster.local.dir provides a list of dir where the temp data will be saved . Search for jobid inside dir to know exact location of the temp data at runtime
but you will not benefit anything in terms of debugging . What you should be actually looking for
1. Are your reducer starting after all mappers have finished. mapreduce.slow.start => 1
2. how many groups are being processed per reducer. try increasing number of reducer to increase parallelism
3. Is your job shuffering from skewed key problem ie for a given key one has lot of values.
4. disk failures where mappers and reducers are running. run dmessage to know about disk failure where the job is running