Support Questions
Find answers, ask questions, and share your expertise

CDP CDE Jobs API: Submitting a pyspark Job with dependencies

CDP CDE Jobs API: Submitting a pyspark Job with dependencies

New Contributor

I have some homegrown python dependencies I use in my pyspark jobs on CDE. When I create a job, I would like to have these dependencies included. I see the API accepts a parameter for `pythonEnvResourceName` but the usage is undocumented. Is this the parameter I am looking for? If so, what is expected to be passed as the value?

 

Screen Shot 2021-04-13 at 8.35.30 PM.png
3 REPLIES 3

Re: CDP CDE Jobs API: Submitting a pyspark Job with dependencies

Expert Contributor

Hello 

 

One way is to package the job with dependencies into a jar file and submit the jar file

 

Here is the example of how to submit the jar file to CDE

https://docs.cloudera.com/data-engineering/cloud/cli-access/topics/cde-cli-submit-job.html

Re: CDP CDE Jobs API: Submitting a pyspark Job with dependencies

New Contributor

This is for a Python application rather than Java or Scala, so we will not be building a jar. Thanks @Daming Xue .

Re: CDP CDE Jobs API: Submitting a pyspark Job with dependencies

Cloudera Employee

Use have to use --py-file via CDE CLI to submit your py files. This way you can submit your custom python scripts/packages as .py/.zip/.egg file(s). Similarly via API, use "pyFiles": [ "string" ] configuration to submit your files.