Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Need additional documentation for rest API - replication status

avatar
Contributor

I'm writing a small script to monitor the status of BDR jobs with the REST apis.

 

I'm having some issue with an endpoint that takes a long time to respond (from my limited testing it scales lineary with the number of jobs and the depth of the history for each job):

https://cloudera.github.io/cm_api/apidocs/v17/path__clusters_-clusterName-_services_-serviceName-_re...

 

In the linked documentation it appears that the api accepts a limits parameter but It's not very well documented: what arguments does it accept? Maybe something to limit the history size?

 

 

 

1 ACCEPTED SOLUTION

avatar
Master Collaborator

If your objective: "..get the state of the last execution (failed/succeded)", and if I remember correctly each replication job generates an AUDIT event [0], a workaround would be to filter the Events [1].

 

On you CM> Diagnostics> Events filter;

Category: AUDIT_EVENT

Event Code: EV_HDFS_DISTCP

parsing the COMMAND_ARGS you can get the scheduleId

Then you can group the results (by COMMAND_ID) to get the execution flow  

COMMAND_STATUS will contain when it STARTED, FAILED, SUCCEEDED, ABORTED

 

[0] https://cloudera.github.io/cm_api/apidocs/v17/path__events.html

[1] http://cm.cloudera.com:7180/api/v12/events?query=category==AUDIT_EVENT;attributes.eventcode==EV_HDFS...

 

View solution in original post

5 REPLIES 5

avatar
Master Collaborator

The link you provided will list all your replication schedules and their job result history. 

If you know the replication schedule id (eg. below is id=5) perhaps using the replication/{id}/history endpoint [0] may help you. You can limit the history size by doing so.

 

 

http://cm-host.cloudera.com:7180/api/v17/clusters/Cluster%201/services/HDFS-1/replications/5/history?limit=1&offset=0

 

[0] https://cloudera.github.io/cm_api/apidocs/v17/path__clusters_-clusterName-_services_-serviceName-_re...

 

 

avatar
Contributor

Thank you Michalis

And if I don't know the id of the jobs in advance? Any way to limit the response from the main uri /api/vXX/clusters/{cluster_name}/services/{service_name}/replications?
What I'm trying to do is just get the list of all defined jobs and get the state of the last execution (failed/succeded)

avatar
Master Collaborator

If your objective: "..get the state of the last execution (failed/succeded)", and if I remember correctly each replication job generates an AUDIT event [0], a workaround would be to filter the Events [1].

 

On you CM> Diagnostics> Events filter;

Category: AUDIT_EVENT

Event Code: EV_HDFS_DISTCP

parsing the COMMAND_ARGS you can get the scheduleId

Then you can group the results (by COMMAND_ID) to get the execution flow  

COMMAND_STATUS will contain when it STARTED, FAILED, SUCCEEDED, ABORTED

 

[0] https://cloudera.github.io/cm_api/apidocs/v17/path__events.html

[1] http://cm.cloudera.com:7180/api/v12/events?query=category==AUDIT_EVENT;attributes.eventcode==EV_HDFS...

 

avatar
Contributor

Michalis thanks for the nice workaround!

 

 

avatar
New Contributor

I'm using rest curl to extract BDP jobs status from history, and calculating the total data volume and avg replication time for each job, its talking over 9 hours to complete with huge file. Is it possible to have filter to extract last 24 hours BDP jobs only to reduce time and file size?

 

Thanks,

Scott