Created 12-15-2024 09:16 PM
Hi Team,
Good Day
We have a requirement to start the CDSW AI ML job from the API call. We are currently using Cloudera Machine Learning legacy REST API (Version V1) as per this link:
https://docs.cloudera.com/machine-learning/1.5.4/jobs-pipelines/topics/ml-rest-apis.html
curl -v -XPOST http://cdsw.example.com/api/v1/<path_to_job> --user "<LEGACY_API_KEY>:"
Ex: curl -XPOST https://***/api/v1/projects/bd***/s***/jobs/2/start --user "***" --header "Content-type: application/json" --data "{}"
We are working to upgrade the CDSW to CML and as part of this job rewrite exercise we also thought to use Cloudera Machine Learning REST API v2 (Version V2) for the curl command to Start the Job
In the below URL, We don't see a method to start the job
Please share your expertise on the below open queries :
1) When is V1 getting depreciated? Is it recommended to use V1 API now?
2) Which API method is to be used in the Version 2 API call to start the job?
Created 01-03-2025 05:15 AM
1) When is V1 getting depreciated?
it is already
https://docs.cloudera.com/machine-learning/cloud/jobs-pipelines/topics/ml-rest-apis.html
--
The Jobs API is now deprecated. See Cloudera AI API v2 and API v2 usage for the successor API.
--
Is it recommended to use V1 API now?
Obviously, No.
2) Which API method is to be used in the Version 2 API call to start the job?
https://docs.cloudera.com/machine-learning/1.5.4/rest-api-reference/index.html#api-CMLService-create...
may be createjob, createjobrun are confusing
Created 01-06-2025 06:01 PM
You can find the Docs for Python and API by going to User Settings -> API Keys
From the Python CML APIv2 Docs, these are the two methods you need:
Step 1: Create Job
from __future__ import print_function
import time
import cmlapi
from cmlapi.rest import ApiException
from pprint import pprint
# create an instance of the API class
api_instance = cmlapi.CMLServiceApi()
body = cmlapi.CreateJobRequest() # CreateJobRequest |
project_id = 'project_id_example' # str | ID of the project containing the job.
try:
# Create a new job.
api_response = api_instance.create_job(body, project_id)
pprint(api_response)
except ApiException as e:
print("Exception when calling CMLServiceApi->create_job: %s\n" % e)
Step 2: Run the Job
from __future__ import print_function
import time
import cmlapi
from cmlapi.rest import ApiException
from pprint import pprint
# create an instance of the API class
api_instance = cmlapi.CMLServiceApi()
body = cmlapi.CreateJobRunRequest() # CreateJobRunRequest |
project_id = 'project_id_example' # str | ID of the project containing the job.
job_id = 'job_id_example' # str | The job ID to create a new job run for.
try:
# Create and start a new job run for a job.
api_response = api_instance.create_job_run(body, project_id, job_id)
pprint(api_response)
except ApiException as e:
print("Exception when calling CMLServiceApi->create_job_run: %s\n" % e)
Using the "cmlapi.CreateJobRequest()" or "cmlapi.CreateJobRunRequest()" methods can be tricky. Here's an advanced example: https://github.com/pdefusco/SparkGen/blob/main/autogen/cml_orchestrator.py
In particular:
sparkgen_1_job_body = cmlapi.CreateJobRequest(
project_id = project_id,
name = "SPARKGEN_1_"+session_id,
script = "autogen/cml_sparkjob_1.py",
cpu = 4.0,
memory = 8.0,
runtime_identifier = "docker.repository.cloudera.com/cloudera/cdsw/ml-runtime-workbench-python3.7-standard:2023.05.1-b4",
runtime_addon_identifiers = ["spark320-18-hf4"],
environment = {
"x":str(x),
"y":str(y),
"z":str(z),
"ROW_COUNT_car_installs":str(ROW_COUNT_car_installs),
"UNIQUE_VALS_car_installs":str(UNIQUE_VALS_car_installs),
"PARTITIONS_NUM_car_installs":str(PARTITIONS_NUM_car_installs),
"ROW_COUNT_car_sales":str(ROW_COUNT_car_sales),
"UNIQUE_VALS_car_sales":str(UNIQUE_VALS_car_sales),
"PARTITIONS_NUM_car_sales":str(PARTITIONS_NUM_car_sales),
"ROW_COUNT_customer_data":str(ROW_COUNT_customer_data),
"UNIQUE_VALS_customer_data":str(UNIQUE_VALS_customer_data),
"PARTITIONS_NUM_customer_data":str(PARTITIONS_NUM_customer_data),
"ROW_COUNT_factory_data":str(ROW_COUNT_factory_data),
"UNIQUE_VALS_factory_data":str(UNIQUE_VALS_factory_data),
"PARTITIONS_NUM_factory_data":str(PARTITIONS_NUM_factory_data),
"ROW_COUNT_geo_data":str(ROW_COUNT_geo_data),
"UNIQUE_VALS_geo_data":str(UNIQUE_VALS_geo_data),
"PARTITIONS_NUM_geo_data":str(PARTITIONS_NUM_geo_data)
}
)
sparkgen_1_job = client.create_job(sparkgen_1_job_body, project_id)
And
jobrun_body = cmlapi.CreateJobRunRequest(project_id, sparkgen_1_job.id)
job_run = client.create_job_run(jobrun_body, project_id, sparkgen_1_job.id)
Hope this helps,