Created 12-21-2017 06:15 PM
Hello, I am currently trying to run a job in Oozie, but this fails. As soon as I start the job, the status is RUNNING. This lasts for about 30 minutes then the status changes to the status KILLED with the error code E0729. The Hadoop system runs in a virtual machine in docker containers. There is a namenode, datanode, resourcemanager, nodemanager, hue, hive-server, hive-metastore, hive-metastore-postgresql and an oozie-server container. All containers are in communication with each other. The system is started via docker-compose up. The data for a MapReduce job (This MapReduce job is an example from the book: Apache Oozie by Mohammad Kamrul Islam & Aravind Srinivasan, the O'REILLY publisher) in Oozie is uploaded via the namenode. To start the job, I switch to the Oozie container and then I switch to the user Oozie with the command su -s /bin/bash oozie. Then I execute the command export OOZIE_URL=http://localhost: 11000/oozie to run the Oozie job in the following step. The command for this is oozie job -config target/example/job.properties -run. If I now show the status about oozie (job -info JOB_ID), I see the status RUNNING for about 30 minutes, then it changes to the status KILLED with the error code E0729. What's strange, is that the running job is not listed in the Oozie web interface, however it is listed in the Hue web interface. I cannot find any helpful information in the logs of the Oozie node and the HDFS node. Below are the job.properties and the workflow.xml as well as the log files.
JOB-PROPERTIES
oozie.use.system.libpath=false
oozie.libpath=${nameNode}/user/${user.name}/share/lib
nameNode=hdfs://namenode:8020
jobTracker=resourcemanager:8032
exampleDir=${nameNode}/user/${user.name}/ch01-identity
oozie.wf.application.path=${exampleDir}/app
WORKFLOW-XML
<workflow-app xmlns="uri:oozie:workflow:0.4" name="identity-WF">
<parameters>
<property><name>jobTracker</name></property>
<property><name>nameNode</name></property>
<property><name>exampleDir</name></property>
</parameters>
<start to="identity-MR"/>
<action name="identity-MR">
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare><delete path="${exampleDir}/data/output"/></prepare>
<configuration>
<property>
<name>mapred.mapper.class</name>
<value>org.apache.hadoop.mapred.lib.IdentityMapper</value>
</property>
<property>
<name>mapred.reducer.class</name>
<value>org.apache.hadoop.mapred.lib.IdentityReducer</value>
</property>
<property>
<name>mapred.input.dir</name>
<value>${exampleDir}/data/input</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>${exampleDir}/data/output</value>
</property>
</configuration>
</map-reduce>
<ok to="success"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>The Identity Map-Reduce job failed!</message>
</kill> <end name="success"/>
</workflow-app>
oozie@oozie-server:/hadoop/oozie/crashTESTdummyOOZIEjob/examples/chapter-01/identity-wf$ export OOZIE_URL=http://localhost:11000/oozie oozie@oozie-server:/hadoop/oozie/crashTESTdummyOOZIEjob/examples/chapter-01/identity-wf$ oozie job -config target/example/job.properties -run job: 0000001-171220131214787-oozie-oozi-W oozie@oozie-server:/hadoop/oozie/crashTESTdummyOOZIEjob/examples/chapter-01/identity-wf$ oozie job -info 0000001-171220131214787-oozie-oozi-W
Job ID : 0000001-171220131214787-oozie-oozi-W
------------------------------------------------------------------------------------------------------------------------------------
Workflow Name : identity-WF App
Path : hdfs://namenode:8020/user/oozie/ch01-identity/app
Status : RUNNING
Run : 0
User : oozie
Group : -
Created : 2017-12-21 07:37 GMT
Started : 2017-12-21 07:37 GMT
Last Modified : 2017-12-21 07:37 GMT
Ended : -
CoordAction ID: -
Actions
------------------------------------------------------------------------------------------------------------------------------------
ID Status Ext ID Ext Status Err Code
------------------------------------------------------------------------------------------------------------------------------------
0000001-171220131214787-oozie-oozi-W@:start: OK - OK -
------------------------------------------------------------------------------------------------------------------------------------
0000001-171220131214787-oozie-oozi-W@identity-MR RUNNING job_1513775507967_0002 RUNNING -
------------------------------------------------------------------------------------------------------------------------------------
oozie@oozie-server:/hadoop/oozie/crashTESTdummyOOZIEjob/examples/chapter-01/identity-wf$ oozie job -oozie http://localhost:11000/oozie -log 0000001-171220131214787-oozie-oozi-W
2017-12-21 07:37:17,272 INFO JPAService:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[] No results found
2017-12-21 07:37:17,302 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] Start action [0000001-171220131214787-oozie-oozi-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2017-12-21 07:37:17,309 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] [***0000001-171220131214787-oozie-oozi-W@:start:***]Action status=DONE
2017-12-21 07:37:17,309 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] [***0000001-171220131214787-oozie-oozi-W@:start:***]Action updated in DB!
2017-12-21 07:37:17,403 INFO JPAService:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] No results found
2017-12-21 07:37:17,445 INFO WorkflowNotificationXCommand:520 - SERVER[oozie-server] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] No Notification URL is defined. Therefore nothing to notify for job 0000001-171220131214787-oozie-oozi-W@:start:
2017-12-21 07:37:17,445 INFO WorkflowNotificationXCommand:520 - SERVER[oozie-server] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[] No Notification URL is defined. Therefore nothing to notify for job 0000001-171220131214787-oozie-oozi-W
2017-12-21 07:37:17,624 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] Start action [0000001-171220131214787-oozie-oozi-W@identity-MR] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2017-12-21 07:37:19,479 INFO MapReduceActionExecutor:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] checking action, hadoop job ID [job_1513775507967_0002] status [RUNNING]
2017-12-21 07:37:19,492 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] [***0000001-171220131214787-oozie-oozi-W@identity-MR***]Action status=RUNNING
2017-12-21 07:37:19,492 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] [***0000001-171220131214787-oozie-oozi-W@identity-MR***]Action updated in DB!
2017-12-21 07:37:19,506 INFO WorkflowNotificationXCommand:520 - SERVER[oozie-server] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] No Notification URL is defined. Therefore nothing to notify for job 0000001-171220131214787-oozie-oozi-W@identity-MR
oozie@oozie-server:/hadoop/oozie/crashTESTdummyOOZIEjob/examples/chapter-01/identity-wf$
oozie@oozie-server:/hadoop/oozie/crashTESTdummyOOZIEjob/examples/chapter-01/identity-wf$ oozie job -info 0000001-171220131214787-oozie-oozi-W
Job ID : 0000001-171220131214787-oozie-oozi-W
------------------------------------------------------------------------------------------------------------------------------------
Workflow Name : identity-WF App
Path : hdfs://namenode:8020/user/oozie/ch01-identity/app
Status : KILLED
Run : 0
User : oozie
Group : -
Created : 2017-12-21 07:37 GMT
Started : 2017-12-21 07:37 GMT
Last Modified : 2017-12-21 07:47 GMT
Ended : 2017-12-21 07:47 GMT
CoordAction ID: -
Actions
------------------------------------------------------------------------------------------------------------------------------------
ID Status Ext ID Ext Status Err Code
------------------------------------------------------------------------------------------------------------------------------------
0000001-171220131214787-oozie-oozi-W@:start: OK - OK -
------------------------------------------------------------------------------------------------------------------------------------
0000001-171220131214787-oozie-oozi-W@identity-MR ERROR job_1513775507967_0002 FAILED/KILLED-
------------------------------------------------------------------------------------------------------------------------------------
0000001-171220131214787-oozie-oozi-W@fail OK - OK E0729
------------------------------------------------------------------------------------------------------------------------------------
oozie@oozie-server:/hadoop/oozie/crashTESTdummyOOZIEjob/examples/chapter-01/identity-wf$ oozie job -oozie http://localhost:11000/oozie -log 0000001-171220131214787-oozie-oozi-W
2017-12-21 07:37:17,272 INFO JPAService:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[] No results found
2017-12-21 07:37:17,302 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] Start action [0000001-171220131214787-oozie-oozi-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2017-12-21 07:37:17,309 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] [***0000001-171220131214787-oozie-oozi-W@:start:***]Action status=DONE
2017-12-21 07:37:17,309 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] [***0000001-171220131214787-oozie-oozi-W@:start:***]Action updated in DB!
2017-12-21 07:37:17,403 INFO JPAService:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] No results found
2017-12-21 07:37:17,445 INFO WorkflowNotificationXCommand:520 - SERVER[oozie-server] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@:start:] No Notification URL is defined. Therefore nothing to notify for job 0000001-171220131214787-oozie-oozi-W@:start:
2017-12-21 07:37:17,445 INFO WorkflowNotificationXCommand:520 - SERVER[oozie-server] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[] No Notification URL is defined. Therefore nothing to notify for job 0000001-171220131214787-oozie-oozi-W
2017-12-21 07:37:17,624 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] Start action [0000001-171220131214787-oozie-oozi-W@identity-MR] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2017-12-21 07:37:19,479 INFO MapReduceActionExecutor:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] checking action, hadoop job ID [job_1513775507967_0002] status [RUNNING]
2017-12-21 07:37:19,492 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] [***0000001-171220131214787-oozie-oozi-W@identity-MR***]Action status=RUNNING
2017-12-21 07:37:19,492 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] [***0000001-171220131214787-oozie-oozi-W@identity-MR***]Action updated in DB!
2017-12-21 07:37:19,506 INFO WorkflowNotificationXCommand:520 - SERVER[oozie-server] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] No Notification URL is defined. Therefore nothing to notify for job 0000001-171220131214787-oozie-oozi-W@identity-MR
2017-12-21 07:47:31,100 INFO MapReduceActionExecutor:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] action completed, external ID [job_1513775507967_0002]
2017-12-21 07:47:31,113 WARN MapReduceActionExecutor:523 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] LauncherMapper died, check Hadoop LOG for job [resourcemanager:8032:job_1513775507967_0002]
2017-12-21 07:47:31,149 INFO ActionEndXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] ERROR is considered as FAILED for SLA
2017-12-21 07:47:31,198 INFO JPAService:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] No results found
2017-12-21 07:47:31,259 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@fail] Start action [0000001-171220131214787-oozie-oozi-W@fail] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2017-12-21 07:47:31,263 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@fail] [***0000001-171220131214787-oozie-oozi-W@fail***]Action status=DONE
2017-12-21 07:47:31,263 INFO ActionStartXCommand:520 - SERVER[oozie-server] USER[oozie] GROUP[-] TOKEN[] APP[identity-WF] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@fail] [***0000001-171220131214787-oozie-oozi-W@fail***]Action updated in DB!
2017-12-21 07:47:31,492 INFO WorkflowNotificationXCommand:520 - SERVER[oozie-server] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@fail] No Notification URL is defined. Therefore nothing to notify for job 0000001-171220131214787-oozie-oozi-W@fail
2017-12-21 07:47:31,499 INFO WorkflowNotificationXCommand:520 - SERVER[oozie-server] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[] No Notification URL is defined. Therefore nothing to notify for job 0000001-171220131214787-oozie-oozi-W
2017-12-21 07:47:31,494 INFO WorkflowNotificationXCommand:520 - SERVER[oozie-server] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-171220131214787-oozie-oozi-W] ACTION[0000001-171220131214787-oozie-oozi-W@identity-MR] No Notification URL is defined. Therefore nothing to notify for job 0000001-171220131214787-oozie-oozi-W@identity-MR
oozie@oozie-server:/hadoop/oozie/crashTESTdummyOOZIEjob/examples/chapter-01/identity-wf$
Thank you in advance
Created 12-21-2017 10:27 PM
Can you get application logs for the failed application using yarn logs -applicationId application_1513775507967_0002 and see what error is seen int he failed application?
Created 12-28-2023 09:36 PM
Hi Even I got the same INFO messages and when I tried to get the logs for that Job application ID below are the logs
oozie job -log 0000041-231010100945447-oozie-oozi-W -logfilter loglevel=WARN\;limit=10 -oozie http://localhost:11000/oozie/
Last login: Thu Dec 28 03:31:59 2023 from l05017024
[bh24224@localhost ~]$ oozie job -log 0000041-231010100945447-oozie-oozi-W -logfilter loglevel=WARN\;limit=10 -oozie http://localhost:11000/oozie/
2023-12-27 02:39:01,589 WARN THREAD[CallableQueue-25] org.apache.oozie.action.hadoop.SparkActionExecutor: SERVER[localhost] USER[fsgapp] GROUP[-] TOKEN[] APP[${INDEXER_PARAMS}-ReindexJob-${ADMIN}-on-${MY_APP}] JOB[0000041-231010100945447-oozie-oozi-W] ACTION[0000041-231010100945447-oozie-oozi-W@spark-node] No credential properties found for action : 0000041-231010100945447-oozie-oozi-W@spark-node, cred : null
2023-12-27 02:39:01,594 WARN THREAD[CallableQueue-25] org.apache.oozie.action.hadoop.SparkActionExecutor: SERVER[localhost] USER[fsgapp] GROUP[-] TOKEN[] APP[${INDEXER_PARAMS}-ReindexJob-${ADMIN}-on-${MY_APP}] JOB[0000041-231010100945447-oozie-oozi-W] ACTION[0000041-231010100945447-oozie-oozi-W@spark-node] Invalid configuration value [null] defined for launcher max attempts count, using default [2].
2023-12-27 02:49:08,017 WARN THREAD[CallableQueue-26] org.apache.oozie.action.hadoop.SparkActionExecutor: SERVER[localhost] USER[fsgapp] GROUP[-] TOKEN[] APP["UPC"-ReindexJob-ASYNC-on-http://localhost/UI] JOB[0000041-231010100945447-oozie-oozi-W] ACTION[0000041-231010100945447-oozie-oozi-W@spark-node] Launcher AM died, check Hadoop LOG for job [localhost:8032:application_1696946942137_0020]
Please suggest me is this environment issue or kind of any changes to be done for the job.
CC: Cloudera_Support.
Created 12-29-2023 08:49 AM
@Raghunath_CDH As this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post. Thanks.
Regards,
Diana Torres,