Created 05-05-2016 03:15 PM
HI,
There are 250 tasks in my mapreduce job. I would to retrieve take time and size of read/write data for each task.
Take time : I can have all in resource manager.
Size of data : I can click each task in resource manager and get he size of data reading and writing.
Is there any way or tool I can collect the information of data size quickly? Not have to go through each link of task.
Thanks,
Ta-Ting
Created 05-05-2016 05:13 PM
For each mapper you can log split size (or output it to seprate file using multioutputformat) in setup function with something like this
context.getInputSplit().getLength()