Reply
Expert Contributor
Posts: 126
Registered: ‎11-01-2013

Analyzing the job run patterns over time for performance tuning

I have an idea of capturing hadoop logs, parse and store the values like cpu used per job, memory used per job, number of processes used, number of threads it spawned, job run time...etc., in HDFS/Hbase and put a web interface on it. This provides the historical resources usage for Hadoop cluster over time......Comparing the corrent job run time with the same job run time in past gives you the performance metrics...

 

This is just an idea I am thinking about.....Please let me know your ideas on this. Thanks

Em Jay
Cloudera Employee
Posts: 2
Registered: ‎11-14-2013

Re: Analyzing the job run patterns over time for performance tuning

Hi Manikumar, that's a great idea. Cloudera Manager has had that type of functionality for MR1 for a long time, and starting with the CM5 beta, we have that functionality for YARN as well.

If you update with the version of CM that you are using, I can help you navigate to the appropriate page(s). It would be great to get your feedback on those features.
Expert Contributor
Posts: 126
Registered: ‎11-01-2013

Re: Analyzing the job run patterns over time for performance tuning

The Version of CM I'm using is 4.6.0

Em Jay
Expert Contributor
Posts: 126
Registered: ‎11-01-2013

Re: Analyzing the job run patterns over time for performance tuning

So you mean to say is all that would be pulled out of cloudera manager in the form of reports ?

Em Jay
Highlighted
Cloudera Employee
Posts: 2
Registered: ‎11-14-2013

Re: Analyzing the job run patterns over time for performance tuning

I'm not sure what you mean by "reports," if you can elaborate that would be great. Let me point you, though, to what exists in CM 4.6.

If you click on the Activities drop-down, there should be a link that says "MapReduce jobs," with the name of your MapReduce service. If you click on that link, it will take you to a page that shows all the jobs that have run in the time window specified by the slider at the top, as well as any that are currently running. From here, you can view a bunch of stats about these jobs, and if you click on a job name, you can drill down and look at the job's counters and other details. You can also drill down to the tasks that constitute the job, and view details, including the logs, from each task.

I hope that helps, please let me know if anything is unclear.
Announcements