Support Questions

Find answers, ask questions, and share your expertise

Why does my Oozie job change from RUNNING to KILLED (running on docker cluster)

avatar
New Contributor

Hello, I am currently trying to run a job in Oozie, but this fails. As soon as I start the job, the status is RUNNING. This lasts for about 30 minutes then the status changes to the status KILLED with the error code E0729. The Hadoop system runs in a virtual machine in docker containers. There is a namenode, datanode, resourcemanager, nodemanager, hue, hive-server, hive-metastore, hive-metastore-postgresql and an oozie-server container. All containers are in communication with each other. The system is started via docker-compose up. The data for a MapReduce job (This MapReduce job is an example from the book: Apache Oozie by Mohammad Kamrul Islam & Aravind Srinivasan, the O'REILLY publisher) in Oozie is uploaded via the namenode. To start the job, I switch to the Oozie container and then I switch to the user Oozie with the command su -s /bin/bash oozie. Then I execute the command export OOZIE_URL=http://localhost: 11000/oozie to run the Oozie job in the following step. The command for this is oozie job -config target/example/job.properties -run. If I now show the status about oozie (job -info JOB_ID), I see the status RUNNING for about 30 minutes, then it changes to the status KILLED with the error code E0729. What's strange, is that the running job is not listed in the Oozie web interface, however it is listed in the Hue web interface. I cannot find any helpful information in the logs of the Oozie node and the HDFS node. Below are the job.properties and the workflow.xml as well as the log files.

JOB-PROPERTIES

oozie.use.system.libpath=false

oozie.libpath=${nameNode}/user/${user.name}/share/lib

nameNode=hdfs://namenode:8020

jobTracker=resourcemanager:8032

exampleDir=${nameNode}/user/${user.name}/ch01-identity

oozie.wf.application.path=${exampleDir}/app

WORKFLOW-XML

<workflow-app xmlns="uri:oozie:workflow:0.4" name="identity-WF">

<parameters>

<property><name>jobTracker</name></property>

<property><name>nameNode</name></property>

<property><name>exampleDir</name></property>

</parameters>

<start to="identity-MR"/>

<action name="identity-MR">

<map-reduce>

<job-tracker>${jobTracker}</job-tracker>

<name-node>${nameNode}</name-node>

<prepare><delete path="${exampleDir}/data/output"/></prepare>

<configuration>

<property>

<name>mapred.mapper.class</name>

<value>org.apache.hadoop.mapred.lib.IdentityMapper</value>

</property>

<property>

<name>mapred.reducer.class</name>

<value>org.apache.hadoop.mapred.lib.IdentityReducer</value>

</property>

<property>

<name>mapred.input.dir</name>

<value>${exampleDir}/data/input</value>

</property>

<property>

<name>mapred.output.dir</name>

<value>${exampleDir}/data/output</value>

</property>

</configuration>

</map-reduce>

<ok to="success"/>

<error to="fail"/>

</action>

<kill name="fail">

<message>The Identity Map-Reduce job failed!</message>

</kill> <end name="success"/>

</workflow-app>

oozie@oozie-server:/hadoop/oozie/crashTESTdummyOOZIEjob/examples/chapter-01/identity-wf$ export OOZIE_URL=http://localhost:11000/oozie oozie@oozie-server:/hadoop/oozie/crashTESTdummyOOZIEjob/examples/chapter-01/identity-wf$ oozie job -config target/example/job.properties -run job: 0000001-171220131214787-oozie-oozi-W oozie@oozie-server:/hadoop/oozie/crashTESTdummyOOZIEjob/examples/chapter-01/identity-wf$ oozie job -info 0000001-171220131214787-oozie-oozi-W

Job ID : 0000001-171220131214787-oozie-oozi-W

------------------------------------------------------------------------------------------------------------------------------------

Workflow Name : identity-WF App

Path : hdfs://namenode:8020/user/oozie/ch01-identity/app

Status : RUNNING

Run : 0

User : oozie

Group : -

Created : 2017-12-21 07:37 GMT

Started : 2017-12-21 07:37 GMT

Last Modified : 2017-12-21 07:37 GMT

Ended : -

CoordAction ID: -

Actions

------------------------------------------------------------------------------------------------------------------------------------

ID Status Ext ID Ext Status Err Code

------------------------------------------------------------------------------------------------------------------------------------

0000001-171220131214787-oozie-oozi-W@:start: OK - OK -

------------------------------------------------------------------------------------------------------------------------------------

0000001-171220131214787-oozie-oozi-W@identity-MR RUNNING job_1513775507967_0002 RUNNING -

------------------------------------------------------------------------------------------------------------------------------------

oozie@oozie-server:/hadoop/oozie/crashTESTdummyOOZIEjob/examples/chapter-01/identity-wf$ oozie job -oozie http://localhost:11000/oozie -log 0000001-171220131214787-oozie-oozi-W 2017-12-21 07:37:17,272 INFO JPAService:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[] No results found 2017-12-21 07:37:17,302 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] Start action [0000001-171220131214787-oozie-oozi-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10] 2017-12-21 07:37:17,309 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] [***0000001-171220131214787-oozie-oozi-W@:start:***]Action status=DONE 2017-12-21 07:37:17,309 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] [***0000001-171220131214787-oozie-oozi-W@:start:***]Action updated in DB! 2017-12-21 07:37:17,403 INFO JPAService:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] No results found 2017-12-21 07:37:17,445 INFO WorkflowNotificationXCommand:520 - SERVER[oozie-server] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] No Notification URL is defined. Therefore nothing to notify for job 0000001-171220131214787-oozie-oozi-W@:start: 2017-12-21 07:37:17,445 INFO WorkflowNotificationXCommand:520 - SERVER[oozie-server] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[] No Notification URL is defined. Therefore nothing to notify for job 0000001-171220131214787-oozie-oozi-W 2017-12-21 07:37:17,624 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] Start action [0000001-171220131214787-oozie-oozi-W@identity-MR] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10] 2017-12-21 07:37:19,479 INFO MapReduceActionExecutor:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] checking action, hadoop job ID [job_1513775507967_0002] status [RUNNING] 2017-12-21 07:37:19,492 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] [***0000001-171220131214787-oozie-oozi-W@identity-MR***]Action status=RUNNING 2017-12-21 07:37:19,492 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] [***0000001-171220131214787-oozie-oozi-W@identity-MR***]Action updated in DB! 2017-12-21 07:37:19,506 INFO WorkflowNotificationXCommand:520 - SERVER[oozie-server] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] No Notification URL is defined. Therefore nothing to notify for job 0000001-171220131214787-oozie-oozi-W@identity-MR oozie@oozie-server:/hadoop/oozie/crashTESTdummyOOZIEjob/examples/chapter-01/identity-wf$

oozie@oozie-server:/hadoop/oozie/crashTESTdummyOOZIEjob/examples/chapter-01/identity-wf$ oozie job -info 0000001-171220131214787-oozie-oozi-W

Job ID : 0000001-171220131214787-oozie-oozi-W

------------------------------------------------------------------------------------------------------------------------------------

Workflow Name : identity-WF App

Path : hdfs://namenode:8020/user/oozie/ch01-identity/app

Status : KILLED

Run : 0

User : oozie

Group : -

Created : 2017-12-21 07:37 GMT

Started : 2017-12-21 07:37 GMT

Last Modified : 2017-12-21 07:47 GMT

Ended : 2017-12-21 07:47 GMT

CoordAction ID: -

Actions

------------------------------------------------------------------------------------------------------------------------------------

ID Status Ext ID Ext Status Err Code

------------------------------------------------------------------------------------------------------------------------------------

0000001-171220131214787-oozie-oozi-W@:start: OK - OK -

------------------------------------------------------------------------------------------------------------------------------------

0000001-171220131214787-oozie-oozi-W@identity-MR ERROR job_1513775507967_0002 FAILED/KILLED-

------------------------------------------------------------------------------------------------------------------------------------

0000001-171220131214787-oozie-oozi-W@fail OK - OK E0729

------------------------------------------------------------------------------------------------------------------------------------

oozie@oozie-server:/hadoop/oozie/crashTESTdummyOOZIEjob/examples/chapter-01/identity-wf$ oozie job -oozie http://localhost:11000/oozie -log 0000001-171220131214787-oozie-oozi-W 2017-12-21 07:37:17,272 INFO JPAService:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[] No results found 2017-12-21 07:37:17,302 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] Start action [0000001-171220131214787-oozie-oozi-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10] 2017-12-21 07:37:17,309 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] [***0000001-171220131214787-oozie-oozi-W@:start:***]Action status=DONE 2017-12-21 07:37:17,309 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] [***0000001-171220131214787-oozie-oozi-W@:start:***]Action updated in DB! 2017-12-21 07:37:17,403 INFO JPAService:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] No results found 2017-12-21 07:37:17,445 INFO WorkflowNotificationXCommand:520 - SERVER[oozie-server] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] No Notification URL is defined. Therefore nothing to notify for job 0000001-171220131214787-oozie-oozi-W@:start: 2017-12-21 07:37:17,445 INFO WorkflowNotificationXCommand:520 - SERVER[oozie-server] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[] No Notification URL is defined. Therefore nothing to notify for job 0000001-171220131214787-oozie-oozi-W 2017-12-21 07:37:17,624 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] Start action [0000001-171220131214787-oozie-oozi-W@identity-MR] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10] 2017-12-21 07:37:19,479 INFO MapReduceActionExecutor:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] checking action, hadoop job ID [job_1513775507967_0002] status [RUNNING] 2017-12-21 07:37:19,492 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] [***0000001-171220131214787-oozie-oozi-W@identity-MR***]Action status=RUNNING 2017-12-21 07:37:19,492 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] [***0000001-171220131214787-oozie-oozi-W@identity-MR***]Action updated in DB! 2017-12-21 07:37:19,506 INFO WorkflowNotificationXCommand:520 - SERVER[oozie-server] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] No Notification URL is defined. Therefore nothing to notify for job 0000001-171220131214787-oozie-oozi-W@identity-MR 2017-12-21 07:47:31,100 INFO MapReduceActionExecutor:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] action completed, external ID [job_1513775507967_0002] 2017-12-21 07:47:31,113 WARN MapReduceActionExecutor:523 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] LauncherMapper died, check Hadoop LOG for job [resourcemanager:8032:job_1513775507967_0002] 2017-12-21 07:47:31,149 INFO ActionEndXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] ERROR is considered as FAILED for SLA 2017-12-21 07:47:31,198 INFO JPAService:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] No results found 2017-12-21 07:47:31,259 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@fail] Start action [0000001-171220131214787-oozie-oozi-W@fail] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10] 2017-12-21 07:47:31,263 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@fail] [***0000001-171220131214787-oozie-oozi-W@fail***]Action status=DONE 2017-12-21 07:47:31,263 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@fail] [***0000001-171220131214787-oozie-oozi-W@fail***]Action updated in DB! 2017-12-21 07:47:31,492 INFO WorkflowNotificationXCommand:520 - SERVER[oozie-server] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@fail] No Notification URL is defined. Therefore nothing to notify for job 0000001-171220131214787-oozie-oozi-W@fail 2017-12-21 07:47:31,499 INFO WorkflowNotificationXCommand:520 - SERVER[oozie-server] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[] No Notification URL is defined. Therefore nothing to notify for job 0000001-171220131214787-oozie-oozi-W 2017-12-21 07:47:31,494 INFO WorkflowNotificationXCommand:520 - SERVER[oozie-server] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] No Notification URL is defined. Therefore nothing to notify for job 0000001-171220131214787-oozie-oozi-W@identity-MR oozie@oozie-server:/hadoop/oozie/crashTESTdummyOOZIEjob/examples/chapter-01/identity-wf$

Thank you in advance

3 REPLIES 3

avatar
Contributor

Can you get application logs for the failed application using yarn logs -applicationId application_1513775507967_0002 and see what error is seen int he failed application?

avatar

Hi Even I got the same INFO messages and when I tried to get the logs for that Job application ID below are the logs

oozie job -log 0000041-231010100945447-oozie-oozi-W -logfilter loglevel=WARN\;limit=10 -oozie http://localhost:11000/oozie/

Last login: Thu Dec 28 03:31:59 2023 from l05017024
[bh24224@localhost ~]$ oozie job -log 0000041-231010100945447-oozie-oozi-W -logfilter loglevel=WARN\;limit=10 -oozie http://localhost:11000/oozie/

2023-12-27 02:39:01,589 WARN THREAD[CallableQueue-25] org.apache.oozie.action.hadoop.SparkActionExecutor: SERVER[localhost] USER[fsgapp] GROUP[-] TOKEN[] APP[${INDEXER_PARAMS}-ReindexJob-${ADMIN}-on-${MY_APP}] JOB[0000041-231010100945447-oozie-oozi-W] ACTION[0000041-231010100945447-oozie-oozi-W@spark-node] No credential properties found for action : 0000041-231010100945447-oozie-oozi-W@spark-node, cred : null

2023-12-27 02:39:01,594 WARN THREAD[CallableQueue-25] org.apache.oozie.action.hadoop.SparkActionExecutor: SERVER[localhost] USER[fsgapp] GROUP[-] TOKEN[] APP[${INDEXER_PARAMS}-ReindexJob-${ADMIN}-on-${MY_APP}] JOB[0000041-231010100945447-oozie-oozi-W] ACTION[0000041-231010100945447-oozie-oozi-W@spark-node] Invalid configuration value [null] defined for launcher max attempts count, using default [2].


2023-12-27 02:49:08,017 WARN THREAD[CallableQueue-26] org.apache.oozie.action.hadoop.SparkActionExecutor: SERVER[localhost] USER[fsgapp] GROUP[-] TOKEN[] APP[&quot;UPC&quot;-ReindexJob-ASYNC-on-http://localhost/UI] JOB[0000041-231010100945447-oozie-oozi-W] ACTION[0000041-231010100945447-oozie-oozi-W@spark-node] Launcher AM died, check Hadoop LOG for job [localhost:8032:application_1696946942137_0020]

 

Please suggest me is this environment issue or kind of any changes to be done for the job.

CC: Cloudera_Support.

avatar
Community Manager

@Raghunath_CDH As this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post. Thanks.


Regards,

Diana Torres,
Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: