Support Questions

Find answers, ask questions, and share your expertise

CML API call to start the AIML job

avatar
Rising Star

Hi Team,

Good Day

We have a requirement to start the CDSW AI ML job from the API call. We are currently using Cloudera Machine Learning legacy REST API (Version V1) as per this link:

https://docs.cloudera.com/machine-learning/1.5.4/jobs-pipelines/topics/ml-rest-apis.html

curl -v -XPOST http://cdsw.example.com/api/v1/<path_to_job> --user "<LEGACY_API_KEY>:"

Ex: curl -XPOST https://***/api/v1/projects/bd***/s***/jobs/2/start  --user "***" --header "Content-type: application/json" --data "{}"


We are working to upgrade the CDSW to CML and as part of this job rewrite exercise we also thought to use Cloudera Machine Learning REST API v2 (Version V2) for the curl command to Start the Job

In the below URL, We don't see a method to start the job

https://docs.cloudera.com/machine-learning/1.5.4/rest-api-reference/index.html#examples-CMLService-r...

Please share your expertise on the below open queries :

1) When is V1 getting depreciated? Is it recommended to use V1 API now?

2) Which API method is to be used in the Version 2 API call to start the job?

2 REPLIES 2

avatar
Expert Contributor

1) When is V1 getting depreciated?
it is already
https://docs.cloudera.com/machine-learning/cloud/jobs-pipelines/topics/ml-rest-apis.html
--
The Jobs API is now deprecated. See Cloudera AI API v2 and API v2 usage for the successor API.
--

Is it recommended to use V1 API now? 
Obviously, No.

2) Which API method is to be used in the Version 2 API call to start the job?
https://docs.cloudera.com/machine-learning/1.5.4/rest-api-reference/index.html#api-CMLService-create...

may be createjob, createjobrun are confusing

avatar
Rising Star

You can find the Docs for Python and API by going to User Settings -> API Keys

Captura de pantalla 2025-01-06 a la(s) 5.55.15 p.m..png

From the Python CML APIv2 Docs, these are the two methods you need:

Step 1: Create Job 

from __future__ import print_function
import time
import cmlapi
from cmlapi.rest import ApiException
from pprint import pprint

# create an instance of the API class
api_instance = cmlapi.CMLServiceApi()
body = cmlapi.CreateJobRequest() # CreateJobRequest |
project_id = 'project_id_example' # str | ID of the project containing the job.

try:
  # Create a new job.
  api_response = api_instance.create_job(body, project_id)
  pprint(api_response)
except ApiException as e:
  print("Exception when calling CMLServiceApi->create_job: %s\n" % e)

Step 2: Run the Job

from __future__ import print_function
import time
import cmlapi
from cmlapi.rest import ApiException
from pprint import pprint

# create an instance of the API class
api_instance = cmlapi.CMLServiceApi()
body = cmlapi.CreateJobRunRequest() # CreateJobRunRequest | 
project_id = 'project_id_example' # str | ID of the project containing the job.
job_id = 'job_id_example' # str | The job ID to create a new job run for.

try:
    # Create and start a new job run for a job.
    api_response = api_instance.create_job_run(body, project_id, job_id)
    pprint(api_response)
except ApiException as e:
    print("Exception when calling CMLServiceApi->create_job_run: %s\n" % e)

 

Using the "cmlapi.CreateJobRequest()" or "cmlapi.CreateJobRunRequest()" methods can be tricky. Here's an advanced example: https://github.com/pdefusco/SparkGen/blob/main/autogen/cml_orchestrator.py

In particular:

sparkgen_1_job_body = cmlapi.CreateJobRequest(
    project_id = project_id,
    name = "SPARKGEN_1_"+session_id,
    script = "autogen/cml_sparkjob_1.py",
    cpu = 4.0,
    memory = 8.0,
    runtime_identifier = "docker.repository.cloudera.com/cloudera/cdsw/ml-runtime-workbench-python3.7-standard:2023.05.1-b4",
    runtime_addon_identifiers = ["spark320-18-hf4"],
    environment = {
                    "x":str(x),
                    "y":str(y),
                    "z":str(z),
                    "ROW_COUNT_car_installs":str(ROW_COUNT_car_installs),
                    "UNIQUE_VALS_car_installs":str(UNIQUE_VALS_car_installs),
                    "PARTITIONS_NUM_car_installs":str(PARTITIONS_NUM_car_installs),
                    "ROW_COUNT_car_sales":str(ROW_COUNT_car_sales),
                    "UNIQUE_VALS_car_sales":str(UNIQUE_VALS_car_sales),
                    "PARTITIONS_NUM_car_sales":str(PARTITIONS_NUM_car_sales),
                    "ROW_COUNT_customer_data":str(ROW_COUNT_customer_data),
                    "UNIQUE_VALS_customer_data":str(UNIQUE_VALS_customer_data),
                    "PARTITIONS_NUM_customer_data":str(PARTITIONS_NUM_customer_data),
                    "ROW_COUNT_factory_data":str(ROW_COUNT_factory_data),
                    "UNIQUE_VALS_factory_data":str(UNIQUE_VALS_factory_data),
                    "PARTITIONS_NUM_factory_data":str(PARTITIONS_NUM_factory_data),
                    "ROW_COUNT_geo_data":str(ROW_COUNT_geo_data),
                    "UNIQUE_VALS_geo_data":str(UNIQUE_VALS_geo_data),
                    "PARTITIONS_NUM_geo_data":str(PARTITIONS_NUM_geo_data)
                    }
)
sparkgen_1_job = client.create_job(sparkgen_1_job_body, project_id)

And

jobrun_body = cmlapi.CreateJobRunRequest(project_id, sparkgen_1_job.id)
job_run = client.create_job_run(jobrun_body, project_id, sparkgen_1_job.id)

 

Hope this helps,