Support Questions

Find answers, ask questions, and share your expertise

Oozie - Launching 100s jobs - Deadlocks?

Hello,

I'm designing ETL for my company.

The source database is Oracle. We have HDP 2.6.

To make things simple, I have 100s of tables to extract data from. I wrote a parameterised sqoop workflow that I call 100s times - each time with different table name as parameter. The calls are from a fork in another oozie workflow.

The problem I'm having is that all the resources of the cluster is being taken up by the oozie launcher and oozie action is not started as there are no resources left. Currently I'm using the default queue for everything.

So the question is; Is my design/understanding wrong? Any help would be appreciated.

<fork name="startAllExtracts">
  <path start="sqoop_action_list"/>
  <path start="sqoop_card"/>
  ...
</fork>

<join name="endAllExtracts" to="end"/>

<action name="sqoop_action_list">
  <sub-workflow>
    <app-path>${pathWF}/pwc_05_sqoop</app-path>
    <propagate-configuration/>
    <configuration>
      <property>
        <name>table_name</name>
        <value>action_list</value>
      </property>
    </configuration>
  </sub-workflow>
  <ok to="endAllExtracts"/><error to="fail"/>
</action>

<action name="sqoop_card">
  <sub-workflow>
    <app-path>${pathWF}/pwc_05_sqoop</app-path>
    <propagate-configuration/>
    <configuration>
      <property>
        <name>table_name</name>
        <value>card</value>
      </property>
    </configuration>
  </sub-workflow>
  <ok to="endAllExtracts"/><error to="fail"/>
</action>
...
1 REPLY 1

Launching 2 jobs in parallel allows me to finish the etl.

3 jobs in parallel creates a deadlock. There must be something wrong with my scheduler config

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.