Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to compute total CPU time taken by a job like terasort in CDH 5.12

How to compute total CPU time taken by a job like terasort in CDH 5.12

New Contributor

Hi,  How to compute total CPU time taken by a job like terasort in cloudera manager for CDH 5.12 

5 REPLIES 5

Re: How to compute total CPU time taken by a job like terasort in CDH 5.12

Super Collaborator

Do you need it for a single run or several ones?

Re: How to compute total CPU time taken by a job like terasort in CDH 5.12

New Contributor

I need it for each individual job.  Is there a way to get it or compute it using tsquery from cloudera manager

Re: How to compute total CPU time taken by a job like terasort in CDH 5.12

Super Collaborator

The best way is to use the history server API API

 

 

https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/HistoryServerRest.html

 

You can do something like this:

 

STARTDATE=`date -d " -1 day " +%s%N | cut -b1-13`
ENDDATE=`date +%s%N | cut -b1-13`

result=`curl -s http://yourhistoryservername:8088/ws/v1/cluster/apps?finishedTimeBegin=$STARTDATE&finishedTimeEnd=$ENDDATE`

echo $result | python -m json.tool | sed 's/["|,]//g' | grep -E "queue|coreSeconds" | awk -v DC="$DC" ' /queue/ { queue = $2 }
/vcoreSeconds/ { arr[queue]+=$2 ; }
END { for (x in arr) {print DC ".yarn." x ".cpums="arr[x]} } '

You can ignore tha parmeter of DC as i use it per each data center and you can replace the pool by job as i collect the metircs per pool and not a job.

Re: How to compute total CPU time taken by a job like terasort in CDH 5.12

New Contributor
Fawze, Thank you so much for the detailed response. This was most
helpful. Much appreciated.
Highlighted

Re: How to compute total CPU time taken by a job like terasort in CDH 5.12

Master Guru

See also: https://www.cloudera.com/documentation/enterprise/latest/topics/cm_dg_yarn_applications.html

 

Via CM tsquery, you could run the below (modifying service_name to fit your cluster):

 

select cpu_milliseconds from YARN_APPLICATIONS where service_name = "yarn"

Each plotted point shown in this chart would be an application event, with all of its other filterable attributes available.

 

Or via CM API (parsing the JSON response via jq) querying all SUCCEEDED jobs that have complete CPU usage (in millisecond time) data, you could run the below (modifying the cluster and service name to fit your installation):

 

 

~> curl 'https://CMHOST:7180/api/v10/clusters/cluster/services/yarn/yarnApplications?filter=state=SUCCEEDED&from=2017-07-21T00:00:00&to=2017-07-22T00:00:00' | jq '.applications[] | .applicationId + "," + .attributes.cpu_milliseconds'
"job_1418590271508_442,123680"
"job_1418590271508_449,110850"
"job_1418590271508_451,19800"
"job_1418590271508_490,12590"
"job_1418590271508_491,18410"
"job_1418590271508_492,12620"

 

Don't have an account?
Coming from Hortonworks? Activate your account here