Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Analyzing the job run patterns over time for performance tuning

Analyzing the job run patterns over time for performance tuning

Expert Contributor

I have an idea of capturing hadoop logs, parse and store the values like cpu used per job, memory used per job, number of processes used, number of threads it spawned, job run time...etc., in HDFS/Hbase and put a web interface on it. This provides the historical resources usage for Hadoop cluster over time......Comparing the corrent job run time with the same job run time in past gives you the performance metrics...

 

This is just an idea I am thinking about.....Please let me know your ideas on this. Thanks

Em Jay
4 REPLIES 4

Re: Analyzing the job run patterns over time for performance tuning

Cloudera Employee
Hi Manikumar, that's a great idea. Cloudera Manager has had that type of functionality for MR1 for a long time, and starting with the CM5 beta, we have that functionality for YARN as well.

If you update with the version of CM that you are using, I can help you navigate to the appropriate page(s). It would be great to get your feedback on those features.

Re: Analyzing the job run patterns over time for performance tuning

Expert Contributor

The Version of CM I'm using is 4.6.0

Em Jay

Re: Analyzing the job run patterns over time for performance tuning

Expert Contributor

So you mean to say is all that would be pulled out of cloudera manager in the form of reports ?

Em Jay

Re: Analyzing the job run patterns over time for performance tuning

Cloudera Employee
I'm not sure what you mean by "reports," if you can elaborate that would be great. Let me point you, though, to what exists in CM 4.6.

If you click on the Activities drop-down, there should be a link that says "MapReduce jobs," with the name of your MapReduce service. If you click on that link, it will take you to a page that shows all the jobs that have run in the time window specified by the slider at the top, as well as any that are currently running. From here, you can view a bunch of stats about these jobs, and if you click on a job name, you can drill down and look at the job's counters and other details. You can also drill down to the tasks that constitute the job, and view details, including the logs, from each task.

I hope that helps, please let me know if anything is unclear.