Reply
Highlighted
MP
New Contributor
Posts: 1
Registered: ‎12-20-2017

Why does my Oozie job change from RUNNING to KILLED (running on docker cluster)

[ Edited ]

Hello, I am currently trying to run a job in Oozie, but this fails. As soon as I start the job, the status is RUNNING. This lasts for about 30 minutes then the status changes to the status KILLED with the error code E0729.


The Hadoop system runs in a virtual machine in docker containers. There is a namenode, datanode, resourcemanager, nodemanager, hue, hive-server, hive-metastore, hive-metastore-postgresql and an oozie-server container. All containers are in communication with each other. The system is started via docker-compose up. The data for a MapReduce job (This MapReduce job is an example from the book: Apache Oozie by Mohammad Kamrul Islam & Aravind Srinivasan, the O'REILLY publisher) in Oozie is uploaded via the namenode. To start the job, I switch to the Oozie container and then I switch to the user Oozie with the command su -s /bin/bash oozie. Then I execute the command export OOZIE_URL=http://localhost: 11000/oozie to run the Oozie job in the following step. The command for this is oozie job -config target/example/job.properties -run. If I now show the status about oozie (job -info JOB_ID), I see the status RUNNING for about 30 minutes, then it changes to the status KILLED with the error code E0729.

What's strange, is that the running job is not listed in the Oozie web interface, however it is listed in the Hue web interface.

I cannot find any helpful information in the logs of the Oozie node and the HDFS node.

Below are the job.properties and the workflow.xml as well as the log files.

 

JOB-PROPERTIES

oozie.use.system.libpath=false
oozie.libpath=${nameNode}/user/${user.name}/share/lib
nameNode=hdfs://namenode:8020
jobTracker=resourcemanager:8032
exampleDir=${nameNode}/user/${user.name}/ch01-identity

oozie.wf.application.path=${exampleDir}/app

 

 


WORKFLOW-XML

<workflow-app xmlns="uri:oozie:workflow:0.4" name="identity-WF">

<parameters>
<property>
<name>jobTracker</name>
</property>
<property>
<name>nameNode</name>
</property>
<property>
<name>exampleDir</name>
</property>
</parameters>

<start to="identity-MR"/>

<action name="identity-MR">
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${exampleDir}/data/output"/>
</prepare>
<configuration>
<property>
<name>mapred.mapper.class</name>
<value>org.apache.hadoop.mapred.lib.IdentityMapper</value>
</property>
<property>
<name>mapred.reducer.class</name>
<value>org.apache.hadoop.mapred.lib.IdentityReducer</value>
</property>
<property>
<name>mapred.input.dir</name>
<value>${exampleDir}/data/input</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>${exampleDir}/data/output</value>
</property>
</configuration>
</map-reduce>
<ok to="success"/>
<error to="fail"/>
</action>

<kill name="fail">
<message>The Identity Map-Reduce job failed!</message>
</kill>

<end name="success"/>

</workflow-app>

 

 

 

 

oozie@oozie-server:/hadoop/oozie/crashTESTdummyOOZIEjob/examples/chapter-01/identity-wf$ export OOZIE_URL=http://localhost:11000/oozie
oozie@oozie-server:/hadoop/oozie/crashTESTdummyOOZIEjob/examples/chapter-01/identity-wf$ oozie job -config target/example/job.properties -run
job: 0000001-171220131214787-oozie-oozi-W
oozie@oozie-server:/hadoop/oozie/crashTESTdummyOOZIEjob/examples/chapter-01/identity-wf$ oozie job -info 0000001-171220131214787-oozie-oozi-W
Job ID : 0000001-171220131214787-oozie-oozi-W
------------------------------------------------------------------------------------------------------------------------------------
Workflow Name : identity-WF
App Path : hdfs://namenode:8020/user/oozie/ch01-identity/app
Status : RUNNING
Run : 0
User : oozie
Group : -
Created : 2017-12-21 07:37 GMT
Started : 2017-12-21 07:37 GMT
Last Modified : 2017-12-21 07:37 GMT
Ended : -
CoordAction ID: -

Actions
------------------------------------------------------------------------------------------------------------------------------------
ID Status Ext ID Ext Status Err Code
------------------------------------------------------------------------------------------------------------------------------------
0000001-171220131214787-oozie-oozi-W@:start: OK - OK -
------------------------------------------------------------------------------------------------------------------------------------
0000001-171220131214787-oozie-oozi-W@identity-MR RUNNING job_1513775507967_0002 RUNNING -
------------------------------------------------------------------------------------------------------------------------------------

oozie@oozie-server:/hadoop/oozie/crashTESTdummyOOZIEjob/examples/chapter-01/identity-wf$ oozie job -oozie http://localhost:11000/oozie -log 0000001-171220131214787-oozie-oozi-W
2017-12-21 07:37:17,272 INFO JPAService:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[] No results found
2017-12-21 07:37:17,302 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] Start action [0000001-171220131214787-oozie-oozi-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2017-12-21 07:37:17,309 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] [***0000001-171220131214787-oozie-oozi-W@:start:***]Action status=DONE
2017-12-21 07:37:17,309 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] [***0000001-171220131214787-oozie-oozi-W@:start:***]Action updated in DB!
2017-12-21 07:37:17,403 INFO JPAService:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] No results found
2017-12-21 07:37:17,445 INFO WorkflowNotificationXCommand:520 - SERVER[oozie-server] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] No Notification URL is defined. Therefore nothing to notify for job 0000001-171220131214787-oozie-oozi-W@:start:
2017-12-21 07:37:17,445 INFO WorkflowNotificationXCommand:520 - SERVER[oozie-server] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[] No Notification URL is defined. Therefore nothing to notify for job 0000001-171220131214787-oozie-oozi-W
2017-12-21 07:37:17,624 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] Start action [0000001-171220131214787-oozie-oozi-W@identity-MR] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2017-12-21 07:37:19,479 INFO MapReduceActionExecutor:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] checking action, hadoop job ID [job_1513775507967_0002] status [RUNNING]
2017-12-21 07:37:19,492 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] [***0000001-171220131214787-oozie-oozi-W@identity-MR***]Action status=RUNNING
2017-12-21 07:37:19,492 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] [***0000001-171220131214787-oozie-oozi-W@identity-MR***]Action updated in DB!
2017-12-21 07:37:19,506 INFO WorkflowNotificationXCommand:520 - SERVER[oozie-server] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] No Notification URL is defined. Therefore nothing to notify for job 0000001-171220131214787-oozie-oozi-W@identity-MR
oozie@oozie-server:/hadoop/oozie/crashTESTdummyOOZIEjob/examples/chapter-01/identity-wf$

 

 

 

 

oozie@oozie-server:/hadoop/oozie/crashTESTdummyOOZIEjob/examples/chapter-01/identity-wf$ oozie job -info 0000001-171220131214787-oozie-oozi-W
Job ID : 0000001-171220131214787-oozie-oozi-W
------------------------------------------------------------------------------------------------------------------------------------
Workflow Name : identity-WF
App Path : hdfs://namenode:8020/user/oozie/ch01-identity/app
Status : KILLED
Run : 0
User : oozie
Group : -
Created : 2017-12-21 07:37 GMT
Started : 2017-12-21 07:37 GMT
Last Modified : 2017-12-21 07:47 GMT
Ended : 2017-12-21 07:47 GMT
CoordAction ID: -

Actions
------------------------------------------------------------------------------------------------------------------------------------
ID Status Ext ID Ext Status Err Code
------------------------------------------------------------------------------------------------------------------------------------
0000001-171220131214787-oozie-oozi-W@:start: OK - OK -
------------------------------------------------------------------------------------------------------------------------------------
0000001-171220131214787-oozie-oozi-W@identity-MR ERROR job_1513775507967_0002 FAILED/KILLED-
------------------------------------------------------------------------------------------------------------------------------------
0000001-171220131214787-oozie-oozi-W@fail OK - OK E0729
------------------------------------------------------------------------------------------------------------------------------------

oozie@oozie-server:/hadoop/oozie/crashTESTdummyOOZIEjob/examples/chapter-01/identity-wf$ oozie job -oozie http://localhost:11000/oozie -log 0000001-171220131214787-oozie-oozi-W
2017-12-21 07:37:17,272 INFO JPAService:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[] No results found
2017-12-21 07:37:17,302 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] Start action [0000001-171220131214787-oozie-oozi-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2017-12-21 07:37:17,309 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] [***0000001-171220131214787-oozie-oozi-W@:start:***]Action status=DONE
2017-12-21 07:37:17,309 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] [***0000001-171220131214787-oozie-oozi-W@:start:***]Action updated in DB!
2017-12-21 07:37:17,403 INFO JPAService:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] No results found
2017-12-21 07:37:17,445 INFO WorkflowNotificationXCommand:520 - SERVER[oozie-server] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] No Notification URL is defined. Therefore nothing to notify for job 0000001-171220131214787-oozie-oozi-W@:start:
2017-12-21 07:37:17,445 INFO WorkflowNotificationXCommand:520 - SERVER[oozie-server] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[] No Notification URL is defined. Therefore nothing to notify for job 0000001-171220131214787-oozie-oozi-W
2017-12-21 07:37:17,624 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] Start action [0000001-171220131214787-oozie-oozi-W@identity-MR] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2017-12-21 07:37:19,479 INFO MapReduceActionExecutor:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] checking action, hadoop job ID [job_1513775507967_0002] status [RUNNING]
2017-12-21 07:37:19,492 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] [***0000001-171220131214787-oozie-oozi-W@identity-MR***]Action status=RUNNING
2017-12-21 07:37:19,492 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] [***0000001-171220131214787-oozie-oozi-W@identity-MR***]Action updated in DB!
2017-12-21 07:37:19,506 INFO WorkflowNotificationXCommand:520 - SERVER[oozie-server] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] No Notification URL is defined. Therefore nothing to notify for job 0000001-171220131214787-oozie-oozi-W@identity-MR
2017-12-21 07:47:31,100 INFO MapReduceActionExecutor:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] action completed, external ID [job_1513775507967_0002]
2017-12-21 07:47:31,113 WARN MapReduceActionExecutor:523 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] LauncherMapper died, check Hadoop LOG for job [resourcemanager:8032:job_1513775507967_0002]
2017-12-21 07:47:31,149 INFO ActionEndXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] ERROR is considered as FAILED for SLA
2017-12-21 07:47:31,198 INFO JPAService:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] No results found
2017-12-21 07:47:31,259 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@fail] Start action [0000001-171220131214787-oozie-oozi-W@fail] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2017-12-21 07:47:31,263 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@fail] [***0000001-171220131214787-oozie-oozi-W@fail***]Action status=DONE
2017-12-21 07:47:31,263 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@fail] [***0000001-171220131214787-oozie-oozi-W@fail***]Action updated in DB!
2017-12-21 07:47:31,492 INFO WorkflowNotificationXCommand:520 - SERVER[oozie-server] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@fail] No Notification URL is defined. Therefore nothing to notify for job 0000001-171220131214787-oozie-oozi-W@fail
2017-12-21 07:47:31,499 INFO WorkflowNotificationXCommand:520 - SERVER[oozie-server] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[] No Notification URL is defined. Therefore nothing to notify for job 0000001-171220131214787-oozie-oozi-W
2017-12-21 07:47:31,494 INFO WorkflowNotificationXCommand:520 - SERVER[oozie-server] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] No Notification URL is defined. Therefore nothing to notify for job 0000001-171220131214787-oozie-oozi-W@identity-MR
oozie@oozie-server:/hadoop/oozie/crashTESTdummyOOZIEjob/examples/chapter-01/identity-wf$

Announcements