Reply
New Contributor
Posts: 3
Registered: ‎07-19-2017

How to compute total CPU time taken by a job like terasort in CDH 5.12

Hi,  How to compute total CPU time taken by a job like terasort in cloudera manager for CDH 5.12 

Expert Contributor
Posts: 256
Registered: ‎01-25-2017

Re: How to compute total CPU time taken by a job like terasort in CDH 5.12

Do you need it for a single run or several ones?

New Contributor
Posts: 3
Registered: ‎07-19-2017

Re: How to compute total CPU time taken by a job like terasort in CDH 5.12

I need it for each individual job.  Is there a way to get it or compute it using tsquery from cloudera manager

Expert Contributor
Posts: 256
Registered: ‎01-25-2017

Re: How to compute total CPU time taken by a job like terasort in CDH 5.12

The best way is to use the history server API API

 

 

https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/HistoryServerRest.html

 

You can do something like this:

 

STARTDATE=`date -d " -1 day " +%s%N | cut -b1-13`
ENDDATE=`date +%s%N | cut -b1-13`

result=`curl -s http://yourhistoryservername:8088/ws/v1/cluster/apps?finishedTimeBegin=$STARTDATE&finishedTimeEnd=$ENDDATE`

echo $result | python -m json.tool | sed 's/["|,]//g' | grep -E "queue|coreSeconds" | awk -v DC="$DC" ' /queue/ { queue = $2 }
/vcoreSeconds/ { arr[queue]+=$2 ; }
END { for (x in arr) {print DC ".yarn." x ".cpums="arr[x]} } '

You can ignore tha parmeter of DC as i use it per each data center and you can replace the pool by job as i collect the metircs per pool and not a job.

New Contributor
Posts: 3
Registered: ‎07-19-2017

Re: How to compute total CPU time taken by a job like terasort in CDH 5.12

Fawze, Thank you so much for the detailed response. This was most
helpful. Much appreciated.
Highlighted
Posts: 1,538
Kudos: 280
Solutions: 235
Registered: ‎07-31-2013

Re: How to compute total CPU time taken by a job like terasort in CDH 5.12

See also: https://www.cloudera.com/documentation/enterprise/latest/topics/cm_dg_yarn_applications.html

 

Via CM tsquery, you could run the below (modifying service_name to fit your cluster):

 

select cpu_milliseconds from YARN_APPLICATIONS where service_name = "yarn"

Each plotted point shown in this chart would be an application event, with all of its other filterable attributes available.

 

Or via CM API (parsing the JSON response via jq) querying all SUCCEEDED jobs that have complete CPU usage (in millisecond time) data, you could run the below (modifying the cluster and service name to fit your installation):

 

 

~> curl 'https://CMHOST:7180/api/v10/clusters/cluster/services/yarn/yarnApplications?filter=state=SUCCEEDED&from=2017-07-21T00:00:00&to=2017-07-22T00:00:00' | jq '.applications[] | .applicationId + "," + .attributes.cpu_milliseconds'
"job_1418590271508_442,123680"
"job_1418590271508_449,110850"
"job_1418590271508_451,19800"
"job_1418590271508_490,12590"
"job_1418590271508_491,18410"
"job_1418590271508_492,12620"

 

Backline Customer Operations Engineer
Announcements