Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Oozie - Launching 100s jobs - Deadlocks?

Highlighted

Oozie - Launching 100s jobs - Deadlocks?

New Contributor

Hello,

I'm designing ETL for my company.

The source database is Oracle. We have HDP 2.6.

To make things simple, I have 100s of tables to extract data from. I wrote a parameterised sqoop workflow that I call 100s times - each time with different table name as parameter. The calls are from a fork in another oozie workflow.

The problem I'm having is that all the resources of the cluster is being taken up by the oozie launcher and oozie action is not started as there are no resources left. Currently I'm using the default queue for everything.

So the question is; Is my design/understanding wrong? Any help would be appreciated.

<fork name="startAllExtracts">
  <path start="sqoop_action_list"/>
  <path start="sqoop_card"/>
  ...
</fork>

<join name="endAllExtracts" to="end"/>

<action name="sqoop_action_list">
  <sub-workflow>
    <app-path>${pathWF}/pwc_05_sqoop</app-path>
    <propagate-configuration/>
    <configuration>
      <property>
        <name>table_name</name>
        <value>action_list</value>
      </property>
    </configuration>
  </sub-workflow>
  <ok to="endAllExtracts"/><error to="fail"/>
</action>

<action name="sqoop_card">
  <sub-workflow>
    <app-path>${pathWF}/pwc_05_sqoop</app-path>
    <propagate-configuration/>
    <configuration>
      <property>
        <name>table_name</name>
        <value>card</value>
      </property>
    </configuration>
  </sub-workflow>
  <ok to="endAllExtracts"/><error to="fail"/>
</action>
...
1 REPLY 1

Re: Oozie - Launching 100s jobs - Deadlocks?

New Contributor

Launching 2 jobs in parallel allows me to finish the etl.

3 jobs in parallel creates a deadlock. There must be something wrong with my scheduler config