I was looking through the python CM API documentation and see a number of classes in endpoints.types that look to be related to replicating data (ApiReplicationCommand, ApiReplicationSchedule, etc), but I don't see anything related to their usage. Is there a way to invoke HDFS or Hive replication jobs through the python CM API?
sure, you can trigger a replication job immediately via CM API.
In Python, you can find ApiService class inside endpoints services.py, which has function trigger_replication_schedule. You need find the service (hdfs or hive) first, and call the function with a schedule id, like:
It returns the corresponding command. If you want dry run, set the 2nd parameter to 'True', by default, it is 'False'.
Hope this helps.
This is a bit of a dated example but here is a java tool that can do this. At one point I tested this with Oozie and it worked ok. This has been used with tools like autosys to kick off replication schedules at a specific time.
It does of course require the scheudle to already exist. Right now the tool lets you list the available schedules. In an upcoming release of CM, this replication ID should be exposed via the UI.