Member since
08-22-2018
79
Posts
11
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
320 | 01-27-2025 07:01 AM | |
835 | 06-27-2024 02:58 AM | |
921 | 01-08-2024 02:22 AM | |
1724 | 06-19-2023 02:41 AM |
01-28-2025
09:19 AM
Thank You for checking it out! I have just questions about terminology: - what do U mean by "workspace"? Is it an Cloudera environment we can find in Management Console? - "suspended the workspace" means stop environment?
... View more
01-21-2025
04:39 PM
@jI-mi Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.
... View more
01-21-2025
04:39 PM
@MID_ACN Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.
... View more
01-06-2025
06:01 PM
You can find the Docs for Python and API by going to User Settings -> API Keys From the Python CML APIv2 Docs, these are the two methods you need: Step 1: Create Job from __future__ import print_function
import time
import cmlapi
from cmlapi.rest import ApiException
from pprint import pprint
# create an instance of the API class
api_instance = cmlapi.CMLServiceApi()
body = cmlapi.CreateJobRequest() # CreateJobRequest |
project_id = 'project_id_example' # str | ID of the project containing the job.
try:
# Create a new job.
api_response = api_instance.create_job(body, project_id)
pprint(api_response)
except ApiException as e:
print("Exception when calling CMLServiceApi->create_job: %s\n" % e) Step 2: Run the Job from __future__ import print_function
import time
import cmlapi
from cmlapi.rest import ApiException
from pprint import pprint
# create an instance of the API class
api_instance = cmlapi.CMLServiceApi()
body = cmlapi.CreateJobRunRequest() # CreateJobRunRequest |
project_id = 'project_id_example' # str | ID of the project containing the job.
job_id = 'job_id_example' # str | The job ID to create a new job run for.
try:
# Create and start a new job run for a job.
api_response = api_instance.create_job_run(body, project_id, job_id)
pprint(api_response)
except ApiException as e:
print("Exception when calling CMLServiceApi->create_job_run: %s\n" % e) Using the "cmlapi.CreateJobRequest()" or "cmlapi.CreateJobRunRequest()" methods can be tricky. Here's an advanced example: https://github.com/pdefusco/SparkGen/blob/main/autogen/cml_orchestrator.py In particular: sparkgen_1_job_body = cmlapi.CreateJobRequest(
project_id = project_id,
name = "SPARKGEN_1_"+session_id,
script = "autogen/cml_sparkjob_1.py",
cpu = 4.0,
memory = 8.0,
runtime_identifier = "docker.repository.cloudera.com/cloudera/cdsw/ml-runtime-workbench-python3.7-standard:2023.05.1-b4",
runtime_addon_identifiers = ["spark320-18-hf4"],
environment = {
"x":str(x),
"y":str(y),
"z":str(z),
"ROW_COUNT_car_installs":str(ROW_COUNT_car_installs),
"UNIQUE_VALS_car_installs":str(UNIQUE_VALS_car_installs),
"PARTITIONS_NUM_car_installs":str(PARTITIONS_NUM_car_installs),
"ROW_COUNT_car_sales":str(ROW_COUNT_car_sales),
"UNIQUE_VALS_car_sales":str(UNIQUE_VALS_car_sales),
"PARTITIONS_NUM_car_sales":str(PARTITIONS_NUM_car_sales),
"ROW_COUNT_customer_data":str(ROW_COUNT_customer_data),
"UNIQUE_VALS_customer_data":str(UNIQUE_VALS_customer_data),
"PARTITIONS_NUM_customer_data":str(PARTITIONS_NUM_customer_data),
"ROW_COUNT_factory_data":str(ROW_COUNT_factory_data),
"UNIQUE_VALS_factory_data":str(UNIQUE_VALS_factory_data),
"PARTITIONS_NUM_factory_data":str(PARTITIONS_NUM_factory_data),
"ROW_COUNT_geo_data":str(ROW_COUNT_geo_data),
"UNIQUE_VALS_geo_data":str(UNIQUE_VALS_geo_data),
"PARTITIONS_NUM_geo_data":str(PARTITIONS_NUM_geo_data)
}
)
sparkgen_1_job = client.create_job(sparkgen_1_job_body, project_id) And jobrun_body = cmlapi.CreateJobRunRequest(project_id, sparkgen_1_job.id)
job_run = client.create_job_run(jobrun_body, project_id, sparkgen_1_job.id) Hope this helps,
... View more
01-03-2025
05:17 AM
1 Kudo
you may consider having a custom built pbj runtime image as per your requirements.
... View more
11-13-2024
12:26 AM
2 Kudos
Hey, in the application logs everything looks fine. Instead of being able to click on the application name like I would normally do when the status is running, I have to go into the application logs and find the gradio app link at the bottom. When I look at the container logs, the last 3 lines are: 2024-11-13 08:22:14.557 18 INFO JupyterWSGLauncher ngh4jdz5m3ua258n Finish running startup chunks: success. data = {"user":"cdsw"} 2024-11-13 08:22:14.557 18 INFO JupyterWSGLauncher ngh4jdz5m3ua258n Proxying to livelog data = {"user":"cdsw"} 2024-11-13 08:22:14.557 18 INFO JupyterWSGTimeoutHandler ngh4jdz5m3ua258n idleTimeoutInMinutes data = {"idleTimeoutInMinutes":60,"user":"cdsw"} Suggesting everything is as expected? Edited: I found the problem. It was how I was defining the port for the application. Instead of using 'os.getenv('CDSW_APP_PORT')' I was using a manually set one. Thank you!
... View more
08-18-2024
10:11 PM
Pyspark 3.5.2 - python >= 3.8 and <=3.11 ref: https://pypi.org/project/pyspark/3.5.2/
... View more
07-12-2024
06:19 AM
Its a CDSW job.
... View more
06-27-2024
02:58 AM
1 Kudo
@littlecong The files need to be uploaded to the individual project. As of now there is no documented provision to share contents between the projects.
... View more
06-14-2024
12:28 AM
I am sceptical that quickstartvm is still available. AFAIK, both Cloudera quickstartvm and Hortonworks sandbox are deprecated and download links/documentations are not public now. You may check if any of archived/cached pages still have the documentation.
... View more