Created 04-13-2021 05:38 PM
I have some homegrown python dependencies I use in my pyspark jobs on CDE. When I create a job, I would like to have these dependencies included. I see the API accepts a parameter for `pythonEnvResourceName` but the usage is undocumented. Is this the parameter I am looking for? If so, what is expected to be passed as the value?
Created 04-13-2021 06:22 PM
Hello
One way is to package the job with dependencies into a jar file and submit the jar file
Here is the example of how to submit the jar file to CDE
https://docs.cloudera.com/data-engineering/cloud/cli-access/topics/cde-cli-submit-job.html
Created 04-14-2021 06:20 AM
This is for a Python application rather than Java or Scala, so we will not be building a jar. Thanks @Daming Xue .
Created 04-14-2021 06:39 PM
Use have to use --py-file via CDE CLI to submit your py files. This way you can submit your custom python scripts/packages as .py/.zip/.egg file(s). Similarly via API, use "pyFiles": [ "string" ] configuration to submit your files.