Support Questions

Find answers, ask questions, and share your expertise

Packaging files in original, nested directory structure for Oozie shell action

New Contributor

I have a PySpark application that has local modules next to the application Python script, something like this:

.
├── foobar
│   ├── config.py
│   ├── foobar.py
│   └── __init__.py
├── application.DEV.ini
├── application.PROD.ini
├── application.py
├── requirements.txt
└── submit-application.sh

 

am trying to use an Oozie workflow to package all script and local module files, but apparently, they are always delivered flattened, dumped into the root directory of the container, regardless any configuration I used. This prevents the Python script from loading the local modules, causing ModuleNotFoundError: No module named 'foobar' errors, because files from the sub-directories are placed in the root of the container. 

It seems that the # notation does not work with sub-directories. This is my Oozie workflow.xml file:

 

<workflow-app name="Data-Extraction-WF" xmlns="uri:oozie:workflow:0.5">

    <global>
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
    </global>

    <start to="Data-Extraction"/>

    <action name="Data-Extraction">
        <shell xmlns="uri:oozie:shell-action:1.0">
            <exec>submit-application.sh</exec>

            <file>app/__init__.py#app/__init__.py</file>
            <file>app/config.py#app/config.py</file>
            <file>app/foobar.py#app/foobar.py</file>
            <file>application.DEV.ini#application.DEV.ini</file>
            <file>application.PROD.ini#application.PROD.ini</file>
            <file>application.py#application.py</file>
            <file>submit-application.sh#submit-application.sh</file>

            <capture-output/>
        </shell>

        <ok to="success"/>
        <error to="failure"/>
    </action>


    <kill name="failure">
        <message>Workflow failed, error message: [${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>

    <end name="success"/>

</workflow-app>

 

Is there any way to tell Oozie to place file artifacts to a sub-directory?

How could I upload all my Python project files in the original structure so that it works with Oozie?

I am on CDH 6, and I cannot find any documentation on this. 

 

0 REPLIES 0
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.