- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
CDP CDE Jobs API: Submitting a pyspark Job with dependencies
Created ‎04-13-2021 05:38 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have some homegrown python dependencies I use in my pyspark jobs on CDE. When I create a job, I would like to have these dependencies included. I see the API accepts a parameter for `pythonEnvResourceName` but the usage is undocumented. Is this the parameter I am looking for? If so, what is expected to be passed as the value?
Created ‎04-13-2021 06:22 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello
One way is to package the job with dependencies into a jar file and submit the jar file
Here is the example of how to submit the jar file to CDE
https://docs.cloudera.com/data-engineering/cloud/cli-access/topics/cde-cli-submit-job.html
Created ‎04-14-2021 06:20 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is for a Python application rather than Java or Scala, so we will not be building a jar. Thanks @Daming Xue .
Created ‎04-14-2021 06:39 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Use have to use --py-file via CDE CLI to submit your py files. This way you can submit your custom python scripts/packages as .py/.zip/.egg file(s). Similarly via API, use "pyFiles": [ "string" ] configuration to submit your files.
