Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to get size of read/write data for each task of mapreduce job?

How to get size of read/write data for each task of mapreduce job?

New Contributor

HI,

There are 250 tasks in my mapreduce job. I would to retrieve take time and size of read/write data for each task.

Take time : I can have all in resource manager.

Size of data : I can click each task in resource manager and get he size of data reading and writing.

Is there any way or tool I can collect the information of data size quickly? Not have to go through each link of task.

Thanks,

Ta-Ting

1 REPLY 1
Highlighted

Re: How to get size of read/write data for each task of mapreduce job?

Rising Star

For each mapper you can log split size (or output it to seprate file using multioutputformat) in setup function with something like this

context.getInputSplit().getLength()
Don't have an account?
Coming from Hortonworks? Activate your account here