Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

CPU and Memory Usage per job perspective.

avatar
Explorer

Actually if we account any job using resource usage such as CPU and Memory,So which metrics we need to check allocated Vcore seconds or CPU time and same as for Ram usage like allocated memory seconds or physical memory .

 

This is due to that at any instant what is the total CPU and RAM usage from the jobs running across the cluster.

4 REPLIES 4

avatar
Expert Contributor

Hi Nickk,

 

If you are looking for what features that are available for YARN resource accounting, we do have two metrics available within the YARN API, as well as a more robust reporting capability within Cloudera Manager 5.7 onward.

 

The following are the definitions of memorySeconds and vcoreSeconds which are used to provide a very basic measurement of utilization in YARN[1]:


memorySeconds = The aggregated amount of memory (in megabytes) the application has allocated times the number of seconds the application has been running.

vcoreSeconds = The aggregated number of vcores that the application has allocated times the number of seconds the application has been running.


The memorySeconds value can be used loosely for generically measuring the amount of resource that the job consumed; For example, job 1 used X amount of memorySeconds as compared to job 2 which used Y amount of memorySeconds. Any further calculations attempting to extrapolate further insight from this measure isn't recommended.

There are some additional reporting efforts that are being worked on, one is now available with CM. Starting in CM 5.7 on, CM offers cluster utilization reporting which can help provide per tenant/user cluster usage reporting.  Further details regarding Cluster Utilization reporting in CM is available here[2]. 


References:
[1] Link to ApplicationResourceUsageReport.java (part of the YARN API) in the Apache source code for Hadoop:  
https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main...

[2] Link to Cloudera Documentation regarding CM's Cluster Utilization Reporting functionality:
http://www.cloudera.com/documentation/enterprise/5-7-x/topics/admin_cluster_util_report.html

 

 

Hope this helps!

 

 

avatar
New Contributor

Hi,

     For most the services, roles, etc,  I see that the CPU utilization and other metrics obtained are at the minimum minute granularity, can someone please let me know how do I get at seconds or milliseconds granularity. I appreciate your response. 

 

Regards

Harsha

avatar
New Contributor

Warning:  vcoreSeconds is actually always depicted in milliseconds (e.g. in the history-server's web GUI) but this has never been documented by Cloudera or by Apache.  I'm surprised to find that I'm the only one on the internet (even in StackOverflow) to note this.  It would be so nice if this finally got placed in documentation, or even better: it would be nice if metric output used the phrase "vcore-ms" instead of "vcore-seconds" when reporting that particular metric.

avatar
Contributor

You are requesting how to get the "Per job" memory and cpu counters.

 

Please see the recent response in:

 

https://community.cloudera.com/t5/Support-Questions/How-to-get-the-YARN-jobs-metadata-directly-not-using-API/m-p/322711/highlight/false#M228910

 

In the metadata (counter) output, you will see the vcore-milliseconds and vcore-millseconds value

for all map and reduce tasks, Task  Summary,  Analysis, File System  Counters for the job and other info about the specific job.