Support Questions
Find answers, ask questions, and share your expertise

Is there a way to invoke Hive/HDFS replication jobs using CM API?

Highlighted

Is there a way to invoke Hive/HDFS replication jobs using CM API?

Explorer

I was looking through the python CM API documentation and see a number of classes in endpoints.types that look to be related to replicating data (ApiReplicationCommand, ApiReplicationSchedule, etc), but I don't see anything related to their usage.  Is there a way to invoke HDFS or Hive replication jobs through the python CM API?

2 REPLIES 2
Highlighted

Re: Is there a way to invoke Hive/HDFS replication jobs using CM API?

Cloudera Employee

sure, you can trigger a replication job immediately via CM API. 

 

In Python, you can find ApiService class inside endpoints services.py, which has function trigger_replication_schedule. You need find the service (hdfs or hive) first, and call the function with a schedule id, like:

       hdfs_service.trigger_replication_schedule(schedule_id, dry_run)

It returns the corresponding command. If you want dry run, set the 2nd parameter to 'True', by default, it is 'False'. 

 

Hope this helps.

 

-Lei

 

Re: Is there a way to invoke Hive/HDFS replication jobs using CM API?

Cloudera Employee

This is a bit of a dated example but here is a java tool that can do this. At one point I tested this with Oozie and it worked ok. This has been used with tools like autosys to kick off replication schedules at a specific time.  

 

 

BDR Action

 

It does of course require the scheudle to already exist. Right now the tool lets you list the available schedules. In an upcoming release of CM, this replication ID should be exposed via the UI.

 

Thanks

 

Jeff