Support Questions

parnigot · ‎08-16-2017

I'm writing a small script to monitor the status of BDR jobs with the REST apis.

I'm having some issue with an endpoint that takes a long time to respond (from my limited testing it scales lineary with the number of jobs and the depth of the history for each job):

https://cloudera.github.io/cm_api/apidocs/v17/path__clusters_-clusterName-_services_-serviceName-_re...

In the linked documentation it appears that the api accepts a limits parameter but It's not very well documented: what arguments does it accept? Maybe something to limit the history size?

michalis · ‎08-17-2017

If your objective: "..get the state of the last execution (failed/succeded)", and if I remember correctly each replication job generates an AUDIT event [0], a workaround would be to filter the Events [1].

On you CM> Diagnostics> Events filter;

Category: AUDIT_EVENT

Event Code: EV_HDFS_DISTCP

parsing the COMMAND_ARGS you can get the scheduleId

Then you can group the results (by COMMAND_ID) to get the execution flow

COMMAND_STATUS will contain when it STARTED, FAILED, SUCCEEDED, ABORTED

[0] https://cloudera.github.io/cm_api/apidocs/v17/path__events.html

[1] http://cm.cloudera.com:7180/api/v12/events?query=category==AUDIT_EVENT;attributes.eventcode==EV_HDFS...

View solution in original post

michalis · ‎08-16-2017

The link you provided will list all your replication schedules and their job result history.

If you know the replication schedule id (eg. below is id=5) perhaps using the replication/{id}/history endpoint [0] may help you. You can limit the history size by doing so.

http://cm-host.cloudera.com:7180/api/v17/clusters/Cluster%201/services/HDFS-1/replications/5/history?limit=1&offset=0

[0] https://cloudera.github.io/cm_api/apidocs/v17/path__clusters_-clusterName-_services_-serviceName-_re...

parnigot · ‎08-17-2017

Thank you Michalis

And if I don't know the id of the jobs in advance? Any way to limit the response from the main uri /api/vXX/clusters/{cluster_name}/services/{service_name}/replications?
What I'm trying to do is just get the list of all defined jobs and get the state of the last execution (failed/succeded)

michalis · ‎08-17-2017

If your objective: "..get the state of the last execution (failed/succeded)", and if I remember correctly each replication job generates an AUDIT event [0], a workaround would be to filter the Events [1].

On you CM> Diagnostics> Events filter;

Category: AUDIT_EVENT

Event Code: EV_HDFS_DISTCP

parsing the COMMAND_ARGS you can get the scheduleId

Then you can group the results (by COMMAND_ID) to get the execution flow

COMMAND_STATUS will contain when it STARTED, FAILED, SUCCEEDED, ABORTED

[0] https://cloudera.github.io/cm_api/apidocs/v17/path__events.html

[1] http://cm.cloudera.com:7180/api/v12/events?query=category==AUDIT_EVENT;attributes.eventcode==EV_HDFS...

parnigot · ‎08-18-2017

Michalis thanks for the nice workaround!

scottwong · ‎03-18-2020

I'm using rest curl to extract BDP jobs status from history, and calculating the total data volume and avg replication time for each job, its talking over 9 hours to complete with huge file. Is it possible to have filter to extract last 24 hours BDP jobs only to reduce time and file size?

Thanks,

Scott

Cloudera Community

Support Questions

Need additional documentation for rest API - replication status