Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Need additional documentation for rest API - replication status

Solved Go to solution
Highlighted

Need additional documentation for rest API - replication status

Explorer

I'm writing a small script to monitor the status of BDR jobs with the REST apis.

 

I'm having some issue with an endpoint that takes a long time to respond (from my limited testing it scales lineary with the number of jobs and the depth of the history for each job):

https://cloudera.github.io/cm_api/apidocs/v17/path__clusters_-clusterName-_services_-serviceName-_re...

 

In the linked documentation it appears that the api accepts a limits parameter but It's not very well documented: what arguments does it accept? Maybe something to limit the history size?

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Need additional documentation for rest API - replication status

Super Collaborator

If your objective: "..get the state of the last execution (failed/succeded)", and if I remember correctly each replication job generates an AUDIT event [0], a workaround would be to filter the Events [1].

 

On you CM> Diagnostics> Events filter;

Category: AUDIT_EVENT

Event Code: EV_HDFS_DISTCP

parsing the COMMAND_ARGS you can get the scheduleId

Then you can group the results (by COMMAND_ID) to get the execution flow  

COMMAND_STATUS will contain when it STARTED, FAILED, SUCCEEDED, ABORTED

 

[0] https://cloudera.github.io/cm_api/apidocs/v17/path__events.html

[1] http://cm.cloudera.com:7180/api/v12/events?query=category==AUDIT_EVENT;attributes.eventcode==EV_HDFS...

 

View solution in original post

5 REPLIES 5
Highlighted

Re: Need additional documentation for rest API - replication status

Super Collaborator

The link you provided will list all your replication schedules and their job result history. 

If you know the replication schedule id (eg. below is id=5) perhaps using the replication/{id}/history endpoint [0] may help you. You can limit the history size by doing so.

 

 

http://cm-host.cloudera.com:7180/api/v17/clusters/Cluster%201/services/HDFS-1/replications/5/history?limit=1&offset=0

 

[0] https://cloudera.github.io/cm_api/apidocs/v17/path__clusters_-clusterName-_services_-serviceName-_re...

 

 

Highlighted

Re: Need additional documentation for rest API - replication status

Explorer

Thank you Michalis

And if I don't know the id of the jobs in advance? Any way to limit the response from the main uri /api/vXX/clusters/{cluster_name}/services/{service_name}/replications?
What I'm trying to do is just get the list of all defined jobs and get the state of the last execution (failed/succeded)

Re: Need additional documentation for rest API - replication status

Super Collaborator

If your objective: "..get the state of the last execution (failed/succeded)", and if I remember correctly each replication job generates an AUDIT event [0], a workaround would be to filter the Events [1].

 

On you CM> Diagnostics> Events filter;

Category: AUDIT_EVENT

Event Code: EV_HDFS_DISTCP

parsing the COMMAND_ARGS you can get the scheduleId

Then you can group the results (by COMMAND_ID) to get the execution flow  

COMMAND_STATUS will contain when it STARTED, FAILED, SUCCEEDED, ABORTED

 

[0] https://cloudera.github.io/cm_api/apidocs/v17/path__events.html

[1] http://cm.cloudera.com:7180/api/v12/events?query=category==AUDIT_EVENT;attributes.eventcode==EV_HDFS...

 

View solution in original post

Highlighted

Re: Need additional documentation for rest API - replication status

Explorer

Michalis thanks for the nice workaround!

 

 

Highlighted

Re: Need additional documentation for rest API - replication status

New Contributor

I'm using rest curl to extract BDP jobs status from history, and calculating the total data volume and avg replication time for each job, its talking over 9 hours to complete with huge file. Is it possible to have filter to extract last 24 hours BDP jobs only to reduce time and file size?

 

Thanks,

Scott

Don't have an account?
Coming from Hortonworks? Activate your account here