Support Questions
Find answers, ask questions, and share your expertise

Why does my Oozie job change from RUNNING to KILLED (running on docker cluster)

New Contributor

Hello, I am currently trying to run a job in Oozie, but this fails. As soon as I start the job, the status is RUNNING. This lasts for about 30 minutes then the status changes to the status KILLED with the error code E0729. The Hadoop system runs in a virtual machine in docker containers. There is a namenode, datanode, resourcemanager, nodemanager, hue, hive-server, hive-metastore, hive-metastore-postgresql and an oozie-server container. All containers are in communication with each other. The system is started via docker-compose up. The data for a MapReduce job (This MapReduce job is an example from the book: Apache Oozie by Mohammad Kamrul Islam & Aravind Srinivasan, the O'REILLY publisher) in Oozie is uploaded via the namenode. To start the job, I switch to the Oozie container and then I switch to the user Oozie with the command su -s /bin/bash oozie. Then I execute the command export OOZIE_URL=http://localhost: 11000/oozie to run the Oozie job in the following step. The command for this is oozie job -config target/example/job.properties -run. If I now show the status about oozie (job -info JOB_ID), I see the status RUNNING for about 30 minutes, then it changes to the status KILLED with the error code E0729. What's strange, is that the running job is not listed in the Oozie web interface, however it is listed in the Hue web interface. I cannot find any helpful information in the logs of the Oozie node and the HDFS node. Below are the job.properties and the workflow.xml as well as the log files.

JOB-PROPERTIES

oozie.use.system.libpath=false

oozie.libpath=${nameNode}/user/${user.name}/share/lib

nameNode=hdfs://namenode:8020

jobTracker=resourcemanager:8032

exampleDir=${nameNode}/user/${user.name}/ch01-identity

oozie.wf.application.path=${exampleDir}/app

WORKFLOW-XML

<workflow-app xmlns="uri:oozie:workflow:0.4" name="identity-WF">

<parameters>

<property><name>jobTracker</name></property>

<property><name>nameNode</name></property>

<property><name>exampleDir</name></property>

</parameters>

<start to="identity-MR"/>

<action name="identity-MR">

<map-reduce>

<job-tracker>${jobTracker}</job-tracker>

<name-node>${nameNode}</name-node>

<prepare><delete path="${exampleDir}/data/output"/></prepare>

<configuration>

<property>

<name>mapred.mapper.class</name>

<value>org.apache.hadoop.mapred.lib.IdentityMapper</value>

</property>

<property>

<name>mapred.reducer.class</name>

<value>org.apache.hadoop.mapred.lib.IdentityReducer</value>

</property>

<property>

<name>mapred.input.dir</name>

<value>${exampleDir}/data/input</value>

</property>

<property>

<name>mapred.output.dir</name>

<value>${exampleDir}/data/output</value>

</property>

</configuration>

</map-reduce>

<ok to="success"/>

<error to="fail"/>

</action>

<kill name="fail">

<message>The Identity Map-Reduce job failed!</message>

</kill> <end name="success"/>

</workflow-app>

oozie@oozie-server:/hadoop/oozie/crashTESTdummyOOZIEjob/examples/chapter-01/identity-wf$ export OOZIE_URL=http://localhost:11000/oozie oozie@oozie-server:/hadoop/oozie/crashTESTdummyOOZIEjob/examples/chapter-01/identity-wf$ oozie job -config target/example/job.properties -run job: 0000001-171220131214787-oozie-oozi-W oozie@oozie-server:/hadoop/oozie/crashTESTdummyOOZIEjob/examples/chapter-01/identity-wf$ oozie job -info 0000001-171220131214787-oozie-oozi-W

Job ID : 0000001-171220131214787-oozie-oozi-W

------------------------------------------------------------------------------------------------------------------------------------

Workflow Name : identity-WF App

Path : hdfs://namenode:8020/user/oozie/ch01-identity/app

Status : RUNNING

Run : 0

User : oozie

Group : -

Created : 2017-12-21 07:37 GMT

Started : 2017-12-21 07:37 GMT

Last Modified : 2017-12-21 07:37 GMT

Ended : -

CoordAction ID: -

Actions

------------------------------------------------------------------------------------------------------------------------------------

ID Status Ext ID Ext Status Err Code

------------------------------------------------------------------------------------------------------------------------------------

0000001-171220131214787-oozie-oozi-W@:start: OK - OK -

------------------------------------------------------------------------------------------------------------------------------------

0000001-171220131214787-oozie-oozi-W@identity-MR RUNNING job_1513775507967_0002 RUNNING -

------------------------------------------------------------------------------------------------------------------------------------

oozie@oozie-server:/hadoop/oozie/crashTESTdummyOOZIEjob/examples/chapter-01/identity-wf$ oozie job -oozie http://localhost:11000/oozie -log 0000001-171220131214787-oozie-oozi-W 2017-12-21 07:37:17,272 INFO JPAService:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[] No results found 2017-12-21 07:37:17,302 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] Start action [0000001-171220131214787-oozie-oozi-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10] 2017-12-21 07:37:17,309 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] [***0000001-171220131214787-oozie-oozi-W@:start:***]Action status=DONE 2017-12-21 07:37:17,309 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] [***0000001-171220131214787-oozie-oozi-W@:start:***]Action updated in DB! 2017-12-21 07:37:17,403 INFO JPAService:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] No results found 2017-12-21 07:37:17,445 INFO WorkflowNotificationXCommand:520 - SERVER[oozie-server] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] No Notification URL is defined. Therefore nothing to notify for job 0000001-171220131214787-oozie-oozi-W@:start: 2017-12-21 07:37:17,445 INFO WorkflowNotificationXCommand:520 - SERVER[oozie-server] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[] No Notification URL is defined. Therefore nothing to notify for job 0000001-171220131214787-oozie-oozi-W 2017-12-21 07:37:17,624 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] Start action [0000001-171220131214787-oozie-oozi-W@identity-MR] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10] 2017-12-21 07:37:19,479 INFO MapReduceActionExecutor:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] checking action, hadoop job ID [job_1513775507967_0002] status [RUNNING] 2017-12-21 07:37:19,492 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] [***0000001-171220131214787-oozie-oozi-W@identity-MR***]Action status=RUNNING 2017-12-21 07:37:19,492 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] [***0000001-171220131214787-oozie-oozi-W@identity-MR***]Action updated in DB! 2017-12-21 07:37:19,506 INFO WorkflowNotificationXCommand:520 - SERVER[oozie-server] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] No Notification URL is defined. Therefore nothing to notify for job 0000001-171220131214787-oozie-oozi-W@identity-MR oozie@oozie-server:/hadoop/oozie/crashTESTdummyOOZIEjob/examples/chapter-01/identity-wf$

oozie@oozie-server:/hadoop/oozie/crashTESTdummyOOZIEjob/examples/chapter-01/identity-wf$ oozie job -info 0000001-171220131214787-oozie-oozi-W

Job ID : 0000001-171220131214787-oozie-oozi-W

------------------------------------------------------------------------------------------------------------------------------------

Workflow Name : identity-WF App

Path : hdfs://namenode:8020/user/oozie/ch01-identity/app

Status : KILLED

Run : 0

User : oozie

Group : -

Created : 2017-12-21 07:37 GMT

Started : 2017-12-21 07:37 GMT

Last Modified : 2017-12-21 07:47 GMT

Ended : 2017-12-21 07:47 GMT

CoordAction ID: -

Actions

------------------------------------------------------------------------------------------------------------------------------------

ID Status Ext ID Ext Status Err Code

------------------------------------------------------------------------------------------------------------------------------------

0000001-171220131214787-oozie-oozi-W@:start: OK - OK -

------------------------------------------------------------------------------------------------------------------------------------

0000001-171220131214787-oozie-oozi-W@identity-MR ERROR job_1513775507967_0002 FAILED/KILLED-

------------------------------------------------------------------------------------------------------------------------------------

0000001-171220131214787-oozie-oozi-W@fail OK - OK E0729

------------------------------------------------------------------------------------------------------------------------------------

oozie@oozie-server:/hadoop/oozie/crashTESTdummyOOZIEjob/examples/chapter-01/identity-wf$ oozie job -oozie http://localhost:11000/oozie -log 0000001-171220131214787-oozie-oozi-W 2017-12-21 07:37:17,272 INFO JPAService:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[] No results found 2017-12-21 07:37:17,302 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] Start action [0000001-171220131214787-oozie-oozi-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10] 2017-12-21 07:37:17,309 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] [***0000001-171220131214787-oozie-oozi-W@:start:***]Action status=DONE 2017-12-21 07:37:17,309 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] [***0000001-171220131214787-oozie-oozi-W@:start:***]Action updated in DB! 2017-12-21 07:37:17,403 INFO JPAService:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] No results found 2017-12-21 07:37:17,445 INFO WorkflowNotificationXCommand:520 - SERVER[oozie-server] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] No Notification URL is defined. Therefore nothing to notify for job 0000001-171220131214787-oozie-oozi-W@:start: 2017-12-21 07:37:17,445 INFO WorkflowNotificationXCommand:520 - SERVER[oozie-server] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[] No Notification URL is defined. Therefore nothing to notify for job 0000001-171220131214787-oozie-oozi-W 2017-12-21 07:37:17,624 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] Start action [0000001-171220131214787-oozie-oozi-W@identity-MR] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10] 2017-12-21 07:37:19,479 INFO MapReduceActionExecutor:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] checking action, hadoop job ID [job_1513775507967_0002] status [RUNNING] 2017-12-21 07:37:19,492 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] [***0000001-171220131214787-oozie-oozi-W@identity-MR***]Action status=RUNNING 2017-12-21 07:37:19,492 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] [***0000001-171220131214787-oozie-oozi-W@identity-MR***]Action updated in DB! 2017-12-21 07:37:19,506 INFO WorkflowNotificationXCommand:520 - SERVER[oozie-server] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] No Notification URL is defined. Therefore nothing to notify for job 0000001-171220131214787-oozie-oozi-W@identity-MR 2017-12-21 07:47:31,100 INFO MapReduceActionExecutor:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] action completed, external ID [job_1513775507967_0002] 2017-12-21 07:47:31,113 WARN MapReduceActionExecutor:523 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] LauncherMapper died, check Hadoop LOG for job [resourcemanager:8032:job_1513775507967_0002] 2017-12-21 07:47:31,149 INFO ActionEndXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] ERROR is considered as FAILED for SLA 2017-12-21 07:47:31,198 INFO JPAService:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] No results found 2017-12-21 07:47:31,259 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@fail] Start action [0000001-171220131214787-oozie-oozi-W@fail] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10] 2017-12-21 07:47:31,263 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@fail] [***0000001-171220131214787-oozie-oozi-W@fail***]Action status=DONE 2017-12-21 07:47:31,263 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@fail] [***0000001-171220131214787-oozie-oozi-W@fail***]Action updated in DB! 2017-12-21 07:47:31,492 INFO WorkflowNotificationXCommand:520 - SERVER[oozie-server] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@fail] No Notification URL is defined. Therefore nothing to notify for job 0000001-171220131214787-oozie-oozi-W@fail 2017-12-21 07:47:31,499 INFO WorkflowNotificationXCommand:520 - SERVER[oozie-server] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[] No Notification URL is defined. Therefore nothing to notify for job 0000001-171220131214787-oozie-oozi-W 2017-12-21 07:47:31,494 INFO WorkflowNotificationXCommand:520 - SERVER[oozie-server] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] No Notification URL is defined. Therefore nothing to notify for job 0000001-171220131214787-oozie-oozi-W@identity-MR oozie@oozie-server:/hadoop/oozie/crashTESTdummyOOZIEjob/examples/chapter-01/identity-wf$

Thank you in advance

1 REPLY 1

Re: Why does my Oozie job change from RUNNING to KILLED (running on docker cluster)

Explorer

Can you get application logs for the failed application using yarn logs -applicationId application_1513775507967_0002 and see what error is seen int he failed application?