Support Questions
Find answers, ask questions, and share your expertise

Oozie Heart beat issue

Oozie Heart beat issue

Contributor

Hello,

We are running Sqoop-Action using Oozie. If we run a single sqoop job through command line, it is finishing in 5 minutes but when we are scheduling through Oozie, it is taking 15 minutes. We are planning to run 50 sqoop jobs in parallel. We tried 10 jobs and it too is taking around 15 minutes to finish where as the avg time to finish a single job is around 3-4 minutes.

While inspecting the logs, we found the Heart Beat issue. I have already given a lot of memory but the issue is still there. Below are the configurations of workflow.xml, yarn-site.xml and core-site.xml

Logs

4974 [uber-SubtaskRunner] WARN  org.apache.sqoop.tool.SqoopTool  - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
5007 [uber-SubtaskRunner] WARN  org.apache.sqoop.tool.BaseSqoopTool  - Setting your password on the command-line is insecure. Consider using -P instead.
5016 [uber-SubtaskRunner] WARN  org.apache.sqoop.ConnFactory  - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
5036 [uber-SubtaskRunner] INFO  org.apache.sqoop.manager.SqlManager  - Using default fetchSize of 1000
5036 [uber-SubtaskRunner] INFO  org.apache.sqoop.tool.CodeGenTool  - Beginning code generation
5445 [uber-SubtaskRunner] INFO  org.apache.sqoop.manager.OracleManager  - Time zone has been set to GMT
5513 [uber-SubtaskRunner] INFO  org.apache.sqoop.manager.SqlManager  - Executing SQL statement: SELECT * FROM RMS.RMS_DXH_EVENT_LOG WHERE INSERT_DATETIME >= to_timestamp('2016-05-28 11', 'YYYY-MM-DD HH24')AND INSERT_DATETIME < to_timestamp('2016-05-28 12', 'YYYY-MM-DD HH24') AND  (1 = 0) 
5524 [uber-SubtaskRunner] INFO  org.apache.sqoop.manager.SqlManager  - Executing SQL statement: SELECT * FROM RMS.RMS_DXH_EVENT_LOG WHERE INSERT_DATETIME >= to_timestamp('2016-05-28 11', 'YYYY-MM-DD HH24')AND INSERT_DATETIME < to_timestamp('2016-05-28 12', 'YYYY-MM-DD HH24') AND  (1 = 0) 
5583 [uber-SubtaskRunner] INFO  org.apache.sqoop.orm.CompilationManager  - HADOOP_MAPRED_HOME is /opt/hadoop/hadoop-2.7.2
6869 [uber-SubtaskRunner] INFO  org.apache.sqoop.orm.CompilationManager  - Writing jar file: /tmp/sqoop-hadoop/compile/cad1dab0e45211fb2421690860d98843/QueryResult.jar
6879 [uber-SubtaskRunner] INFO  org.apache.sqoop.mapreduce.ImportJobBase  - Beginning query import.
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
690272 [uber-SubtaskRunner] INFO  org.apache.sqoop.mapreduce.ImportJobBase  - Transferred 3.1365 MB in 683.365 seconds (4.7 KB/sec)
690277 [uber-SubtaskRunner] INFO  org.apache.sqoop.mapreduce.ImportJobBase  - Retrieved 5057 records.

<<< Invocation of Sqoop command completed <<<

 Hadoop Job IDs executed by Sqoop: job_1470385625721_0038


<<< Invocation of Main class completed <<<


Workflow.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<workflow-app xmlns="uri:oozie:workflow:0.4" name="oozie_batch_type_wf">
<start to="RMS_DXH_EVENT_LOG6"/>
<action name="RMS_DXH_EVENT_LOG6">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>oozie.launcher.mapreduce.map.memory.mb</name>
<value>3072</value>
</property>
<property>
<name>oozie.launcher.mapreduce.reduce.memory.mb</name>
<value>6144</value>
</property>
<property>
<name>oozie.launcher.mapreduce.child.java.opts</name>
<value>-Xmx8g</value>
</property>
<property>
<name>oozie.launcher.mapred.job.queue.name</name>
<value>default</value>
</property>
</configuration>
<arg>import</arg>
<arg>--connect</arg>

Yarn-site.xml

<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>122880</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>8192</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>55</value>
</property>

<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>5</value>
</property>


Mapred-site.xml

<property>
<name>mapreduce.map.memory.mb</name>
<value>3072</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx2455m</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>6144</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx4915m</value>
</property>
<property>
<name>mapreduce.job.maps</name>
<value>10</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>10</value>
</property>

12 REPLIES 12

Re: Oozie Heart beat issue

I suspect each oozie job is submitting to the same "default" YARN queue. YARN's capacity scheduler is FIFO per YARN queue so jobs took longer when submitted together rather than one at a time. You could create multiple queues and distribute each oozie job across queues or make use of fair scheduler:

<property>
  <name>yarn.scheduler.capacity.<queue-path>.ordering-policy</name>
  <value>fair</value>
</property> 

Re: Oozie Heart beat issue

Contributor

Hello @Ameet Paranjape

I had setup two queues in Capacity scheduler where all the oozie launchers and oozie actions were separated but still it took the same time. So, is it still really required to have a fair scheduler to make this thing working?

Re: Oozie Heart beat issue

@Rinku Singh, the queues are FIFO by default, so to update the behavior you'll need to edit policy as I suggested. The fair scheduler link above documents job behavior with examples so you can confirm this *is* what you looking for. Hope this helps,

Ameet

Re: Oozie Heart beat issue

Contributor

Hello @Ameet Paranjape,

The fair scheduler too was taking the same amount of time. I drilled down the root cause. I have created another question to address the issue here

Re: Oozie Heart beat issue

Contributor

Hello @Ameet Paranjape

I had mentioned earlier that I have drilled down the issue. My mistake, I was wrong. The issue is still there. There is no problem with the available resources as it has 600 GB of RAM and we are running only 10 jobs in parallel. Configuring the fair scheduler didn't help. Does it have anything to do with the Oozie configuration?

Re: Oozie Heart beat issue

Super Guru

@Rinku Singh

Can you please add below property in your workflow.xml and try to assign the same values as per your yarn configs.

<property> 
	<name>oozie.launcher.yarn.app.mapreduce.am.resource.mb</name> 
	<value>Refer your yarn-site.xml</value> 
</property> 
<property> 
	<name>oozie.launcher.yarn.app.mapreduce.am.command-opts</name> 
	<value>Refer your yarn-site.xml</value> 
</property> 

Also I can see that you have set oozie launcher mapper memory to 3G and heap for the same is assigned 8G which is too much. Heap memory should be 75-80% of mapper container.

Hope this information helps.

Please do let us know how it goes after adding above props and correcting heap size.

Re: Oozie Heart beat issue

Contributor

Hello @Kuldeep Kulkarni

Before implementing the changes, I did a small test.

I created a normal workflow(without any memory configurations) to sqoop 7 tables in parallel using fork and join method in oozie. This ran for 11 minutes.

I then created a shell script to launch 7 jobs in parallel. This too ran for 11 minutes.

One major difference that I observed is that in shell script, some jobs finish in 4 minutes some in 8 minutes and some took 11 minutes. In oozie, as we are running in a batch using fork and join, the other jobs even though they are finished, will wait until other jobs in the batch finish.

Now here is the catch. If I run any of these 7 jobs individually, it doesn't take more than 4 minutes to finish. As we are spawning the jobs in parallel using shell script, it should finish in 4 minutes. But instead 2 jobs out of 7 takes 11 minutes to finish.

Below are my setting of yarn-site.xml and mapred-site.xml. I am really not sure where am I missing. Can you help me see this through.

Mapred-site.xml

<em><property>
<name>mapreduce.map.memory.mb</name>
<value>3072</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx2455m</value>
</property>
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>8192</value>
</property>
<property>
<name>yarn.app.mapreduce.am.command-opts</name>
<value>-Xmx6553m</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx4096m</value>
</property>



</em>

Yarn-site.xml

<em><property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>122880</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>122880</value>
</property>
</em>

Re: Oozie Heart beat issue

Super Guru
@Rinku Singh

Regarding - Now here is the catch. If I run any of these 7 jobs individually, it doesn't take more than 4 minutes to finish. As we are spawning the jobs in parallel using shell script, it should finish in 4 minutes. But instead 2 jobs out of 7 takes 11 minutes to finish.

When you are spawning jobs in parallel using shell script, your capacity scheduler configuration will come into the picture. What kind of queues you have configured, how much allocation you have given, all these things matters when you submit multiple jobs in parallel in a single queue. When you submit jobs sequentially then at a time only one job runs in that particular queue hence it does not run longer than expected.

Hope this information helps.

Re: Oozie Heart beat issue

Contributor

Hi @Kuldeep Kulkarni

There is only single queue which is configured as below. All I am running is just 7 jobs with so many resources. Do you still feel anything is missing in the below configuration or should I look at the mapreduce side?

Absolute Capacity:100.0%

Absolute Max Capacity:100.0%

Max Applications:10000

Max Applications Per User:10000

Max Application Master Resources:<memory:614400, vCores:276>

Max Application Master Resources Per User:<memory:614400, vCores:276>

Configured Capacity:100.0%

Configured Max Capacity:100.0%

Configured Minimum User Limit Percent:100%

Configured User Limit Factor:1.0

Accessible Node Labels:*

Preemption:disabled