We have a heterogenenous environment which we would like to perform some objective accounting of 'work' done by each node.
Can we track the # of reduce tasks (and assume each reduce task is of equal weight) that are performed by each node?
Appreciate any responses/feedback.
You can make a custom chart to track MR activity and facet by host. Check out documentation of charting here:
The easiest thing to do is probably to click on your MapReduce, click on a TaskTracker, look at the Slot Utilzation chart, hover over the chart and click the dropdown and select Edit a Copy, then you'll see the following sql-like code at the top:
select (maps_running / (map_task_slots + reduce_task_slots)) * 100, (reduces_running / (map_task_slots + reduce_task_slots)) * 100 where entityName=$ROLENAME
Take out "where entityName=$ROLENAME" and you'll get a chart that shows for all role types. Then facet by hostname. (or you can have it all combined if you want to compare them all in one chart). You may want to pick line chart instead of stack area. Save it somewhere, probably a custom view, and you can then easily find it again in the future from the Charts dropdown at the top of the CM UI.
This will give you a view with one chart per host showing slot utilization over time (or one combined chart for easy comparison, depending what you picked). To see a different time range, use the time slider.