Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

oozie job stuck in - 'Running ' state always

Solved Go to solution
Highlighted

oozie job stuck in - 'Running ' state always

New Contributor

I followed the oozie official document to run examples they have provided. i able to see the status of the oozie server state as 'Normal'. i tried running the oozie map-reduce example job by modifying name node and job tracker address like below

job is getting created, but when i check the -info and log of the job. oozie job status always in Running mode.i'm using

Hadoop 2.7.3.2.6.4.0-91 which is installed through hortonworks ambari and oozie version is 4.2.0.could not identify what exactly going wrong ,please suggest the way.

job.property.xml

nameNode=hdfs://hdn01.abc.com:8020
jobTracker=hdn04.abc.com:8021
queueName=default
examplesRoot=oozie_test/examples
oozie.wf.application.path=${nameNode}/user/${examplesRoot}/apps/map-reduce/workflow.xml
outputDir=map-reduce

JOB info:
 oozie job -oozie http://hdn03.abc.com:11000/oozie -info 0000002-180416122730802-oozie-oozi-W
Job ID : 0000002-180416122730802-oozie-oozi-W
------------------------------------------------------------------------------------------------------------------------------------
Workflow Name : map-reduce-wf
App Path      : hdfs://hdn01.abc.com:8020/user/oozie_test/examples/apps/map-reduce/workflow.xml
Status        : RUNNING
Run           : 0
User          : root
Group         : -
Created       : 2018-04-16 08:39 GMT
Started       : 2018-04-16 08:39 GMT
Last Modified : 2018-04-16 08:39 GMT
Ended         : -
CoordAction ID: -


Actions
------------------------------------------------------------------------------------------------------------------------------------
ID                                                                            Status    Ext ID                 Ext Status Err Code
------------------------------------------------------------------------------------------------------------------------------------
0000002-180416122730802-oozie-oozi-W@:start:                                  OK        -                      OK         -  
------------------------------------------------------------------------------------------------------------------------------------
0000002-180416122730802-oozie-oozi-W@mr-node                                  PREP      -                      -          -  
------------------------------------------------------------------------------------------------------------------------------------



LOG:
oozie job -oozie http://hdn03.abc.com:11000/oozie -log 0000002-180416122730802-oozie-oozi-W
2018-04-16 14:09:02,222  INFO ActionStartXCommand:520 - SERVER[hdn03.abc.com] USER[root] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000002-180416122730802-oozie-oozi-W] ACTION[0000002-180416122730802-oozie-oozi-W@:start:] Start action [0000002-180416122730802-oozie-oozi-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2018-04-16 14:09:02,223  INFO ActionStartXCommand:520 - SERVER[hdn03.abc.com] USER[root] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000002-180416122730802-oozie-oozi-W] ACTION[0000002-180416122730802-oozie-oozi-W@:start:] [***0000002-180416122730802-oozie-oozi-W@:start:***]Action status=DONE
2018-04-16 14:09:02,223  INFO ActionStartXCommand:520 - SERVER[hdn03.abc.com] USER[root] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000002-180416122730802-oozie-oozi-W] ACTION[0000002-180416122730802-oozie-oozi-W@:start:] [***0000002-180416122730802-oozie-oozi-W@:start:***]Action updated in DB!
2018-04-16 14:09:02,311  INFO WorkflowNotificationXCommand:520 - SERVER[hdn03.abc.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000002-180416122730802-oozie-oozi-W] ACTION[0000002-180416122730802-oozie-oozi-W@:start:] No Notification URL is defined. Therefore nothing to notify for job 0000002-180416122730802-oozie-oozi-W@:start:
2018-04-16 14:09:02,312  INFO WorkflowNotificationXCommand:520 - SERVER[hdn03.abc.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000002-180416122730802-oozie-oozi-W] ACTION[] No Notification URL is defined. Therefore nothing to notify for job 0000002-180416122730802-oozie-oozi-W
2018-04-16 14:09:02,335  INFO ActionStartXCommand:520 - SERVER[hdn03.abc.com] USER[root] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000002-180416122730802-oozie-oozi-W] ACTION[0000002-180416122730802-oozie-oozi-W@mr-node] Start action [0000002-180416122730802-oozie-oozi-W@mr-node] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
root@chmgs:/# oozie job -oozie http://hdn03.abc.com:11000/oozie -log 0000002-180416122730802-oozie-oozi-W
2018-04-16 14:09:02,222  INFO ActionStartXCommand:520 - SERVER[hdn03.abc.com] USER[root] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000002-180416122730802-oozie-oozi-W] ACTION[0000002-180416122730802-oozie-oozi-W@:start:] Start action [0000002-180416122730802-oozie-oozi-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2018-04-16 14:09:02,223  INFO ActionStartXCommand:520 - SERVER[hdn03.abc.com] USER[root] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000002-180416122730802-oozie-oozi-W] ACTION[0000002-180416122730802-oozie-oozi-W@:start:] [***0000002-180416122730802-oozie-oozi-W@:start:***]Action status=DONE
2018-04-16 14:09:02,223  INFO ActionStartXCommand:520 - SERVER[hdn03.abc.com] USER[root] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000002-180416122730802-oozie-oozi-W] ACTION[0000002-180416122730802-oozie-oozi-W@:start:] [***0000002-180416122730802-oozie-oozi-W@:start:***]Action updated in DB!
2018-04-16 14:09:02,311  INFO WorkflowNotificationXCommand:520 - SERVER[hdn03.abc.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000002-180416122730802-oozie-oozi-W] ACTION[0000002-180416122730802-oozie-oozi-W@:start:] No Notification URL is defined. Therefore nothing to notify for job 0000002-180416122730802-oozie-oozi-W@:start:
2018-04-16 14:09:02,312  INFO WorkflowNotificationXCommand:520 - SERVER[hdn03.abc.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000002-180416122730802-oozie-oozi-W] ACTION[] No Notification URL is defined. Therefore nothing to notify for job 0000002-180416122730802-oozie-oozi-W
2018-04-16 14:09:02,335  INFO ActionStartXCommand:520 - SERVER[hdn03.abc.com] USER[root] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000002-180416122730802-oozie-oozi-W] ACTION[0000002-180416122730802-oozie-oozi-W@mr-node] Start action [0000002-180416122730802-oozie-oozi-W@mr-node] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]







1 ACCEPTED SOLUTION

Accepted Solutions

Re: oozie job stuck in - 'Running ' state always

New Contributor

Hi i found myself answer that i'm giving wrong port to job tracker

instead of below address

<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hdn04.abc.com:8030</value>
</property>


we need to give


<property>
<name>yarn.resourcemanager.address</name>
<value>hdn04.abc.com:8050</value>
</property> As JOB tracker

4 REPLIES 4

Re: oozie job stuck in - 'Running ' state always

Super Guru
@Pandu123 P

Please change jobTracker=hdn04.abc.com:8021 to jobTracker=hdn04.abc.com:8032 and try again.

Re: oozie job stuck in - 'Running ' state always

New Contributor

@Kuldeep Kulkarni thank u for response, I changed the job tracker value as you suggested but still no luck ..job status in forever 'Running' mode

nameNode=hdfs://hdn01.abc.com:8020
jobTracker=hdn04.abc.com:8032
queueName=default
examplesRoot=oozie_test/examplesoozie.wf.application.path=${nameNode}/user/${examplesRoot}/apps/map-reduce/workflow.xml

outputDir=map-reduce
job log :
$ oozie job -oozie http://hdn03.abc.com:11000/oozie -log 0000001-180416171621790-oozie-oozi-W2018-04-17 10:13:54,369  INFO ActionStartXCommand:520 - SERVER[hdn03.abc.com] USER[hdfs] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000001-180416171621790-oozie-oozi-W] ACTION[0000001-180416171621790-oozie-oozi-W@:start:] Start action [0000001-180416171621790-oozie-oozi-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]2018-04-17 10:13:54,370  INFO ActionStartXCommand:520 - SERVER[hdn03.abc.com] USER[hdfs] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000001-180416171621790-oozie-oozi-W] ACTION[0000001-180416171621790-oozie-oozi-W@:start:] [***0000001-180416171621790-oozie-oozi-W@:start:***]Action status=DONE2018-04-17 10:13:54,370  INFO ActionStartXCommand:520 - SERVER[hdn03.abc.com] USER[hdfs] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000001-180416171621790-oozie-oozi-W] ACTION[0000001-180416171621790-oozie-oozi-W@:start:] [***0000001-180416171621790-oozie-oozi-W@:start:***]Action updated in DB!2018-04-17 10:13:54,424  INFO WorkflowNotificationXCommand:520 - SERVER[hdn03.abc.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-180416171621790-oozie-oozi-W] ACTION[0000001-180416171621790-oozie-oozi-W@:start:] No Notification URL is defined. Therefore nothing to notify for job 0000001-180416171621790-oozie-oozi-W@:start:2018-04-17 10:13:54,428  INFO WorkflowNotificationXCommand:520 - SERVER[hdn03.abc.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-180416171621790-oozie-oozi-W] ACTION[] No Notification URL is defined. Therefore nothing to notify for job 0000001-180416171621790-oozie-oozi-W2018-04-17 10:13:54,447  INFO ActionStartXCommand:520 - SERVER[hdn03.abc.com] USER[hdfs] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000001-180416171621790-oozie-oozi-W] ACTION[0000001-180416171621790-oozie-oozi-W@mr-node] Start action [0000001-180416171621790-oozie-oozi-W@mr-node] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]


my yarn.site.xml in Hadoop

<configuration>

<property>
<name>hadoop.registry.rm.enabled</name>
<value>true</value>
</property>

<property>
<name>hadoop.registry.zk.quorum</name>
<value>hdn03.abc.com:2181,hdn04.abc.com:2181,hdn05.abc.com:2181,hdn02.abc.com:2181</value>
</property>

<property>
<name>manage.include.files</name>
<value>false</value>
</property>

<property>
<name>yarn.acl.enable</name>
<value>false</value>
</property>

<property>
<name>yarn.admin.acl</name>
<value>activity_analyzer,yarn</value>
</property>

<property>
<name>yarn.application.classpath</name>
<value>/usr/hdp/2.6.4.0-91/hadoop/conf,/usr/hdp/2.6.4.0-91/hadoop/*,/usr/hdp/2.6.4.0-91/hadoop/lib/*,/usr/hdp/current/hadoop-hdfs-client/*,/usr/hdp/current/hadoop-hdfs-client/lib/*,/usr/hdp/current/hadoop-yarn-client/*,/usr/hdp/current/hadoop-yarn-client/lib/*,/usr/hdp/current/ext/hadoop/*</value>
</property>

<property>
<name>yarn.client.failover-proxy-provider</name>
<value>org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider</value>
</property>

<property>
<name>yarn.client.nodemanager-connect.max-wait-ms</name>
<value>60000</value>
</property>

<property>
<name>yarn.client.nodemanager-connect.retry-interval-ms</name>
<value>10000</value>
</property>

<property>
<name>yarn.http.policy</name>
<value>HTTP_ONLY</value>
</property>

<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>

<property>
<name>yarn.log-aggregation.file-controller.IndexedFormat.class</name>
<value>org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController</value>
</property>

<property>
<name>yarn.log-aggregation.file-controller.TFile.class</name>
<value>org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.LogAggregationTFileController</value>
</property>

<property>
<name>yarn.log-aggregation.file-formats</name>
<value>IndexedFormat,TFile</value>
</property>

<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>2592000</value>
</property>

<property>
<name>yarn.log.server.url</name>
<value>http://hdn04.abc.com:19888/jobhistory/logs</value>
</property>

<property>
<name>yarn.log.server.web-service.url</name>
<value>http://hdn04.abc.com:8188/ws/v1/applicationhistory</value>
</property>

<property>
<name>yarn.node-labels.enabled</name>
<value>false</value>
</property>

<property>
<name>yarn.node-labels.fs-store.retry-policy-spec</name>
<value>2000, 500</value>
</property>

<property>
<name>yarn.node-labels.fs-store.root-dir</name>
<value>/system/yarn/node-labels</value>
</property>

<property>
<name>yarn.nodemanager.address</name>
<value>0.0.0.0:45454</value>
</property>

<property>
<name>yarn.nodemanager.admin-env</name>
<value>MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX</value>
</property>

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle,spark_shuffle,spark2_shuffle</value>
</property>

<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

<property>
<name>yarn.nodemanager.aux-services.spark2_shuffle.class</name>
<value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property>

<property>
<name>yarn.nodemanager.aux-services.spark2_shuffle.classpath</name>
<value>/usr/hdp/${hdp.version}/spark2/aux/*</value>
</property>

<property>
<name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
<value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property>

<property>
<name>yarn.nodemanager.aux-services.spark_shuffle.classpath</name>
<value>/usr/hdp/${hdp.version}/spark/aux/*</value>
</property>

<property>
<name>yarn.nodemanager.bind-host</name>
<value>0.0.0.0</value>
</property>

<property>
<name>yarn.nodemanager.container-executor.class</name>
<value>org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor</value>
</property>

<property>
<name>yarn.nodemanager.container-metrics.unregister-delay-ms</name>
<value>60000</value>
</property>

<property>
<name>yarn.nodemanager.container-monitor.interval-ms</name>
<value>3000</value>
</property>

<property>
<name>yarn.nodemanager.delete.debug-delay-sec</name>
<value>0</value>
</property>

<property>
<name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
<value>90</value>
</property>

<property>
<name>yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb</name>
<value>1000</value>
</property>

<property>
<name>yarn.nodemanager.disk-health-checker.min-healthy-disks</name>
<value>0.25</value>
</property>

<property>
<name>yarn.nodemanager.health-checker.interval-ms</name>
<value>135000</value>
</property>

<property>
<name>yarn.nodemanager.health-checker.script.timeout-ms</name>
<value>60000</value>
</property>

<property>
<name>yarn.nodemanager.kill-escape.launch-command-line</name>
<value>slider-agent,LLAP</value>
</property>

<property>
<name>yarn.nodemanager.kill-escape.user</name>
<value>hive</value>
</property>

<property>
<name>yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage</name>
<value>false</value>
</property>

<property>
<name>yarn.nodemanager.linux-container-executor.group</name>
<value>hadoop</value>
</property>

<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/hadoop/yarn/local,/data/hadoop/yarn/local</value>
</property>

<property>
<name>yarn.nodemanager.log-aggregation.compression-type</name>
<value>gz</value>
</property>

<property>
<name>yarn.nodemanager.log-aggregation.debug-enabled</name>
<value>false</value>
</property>

<property>
<name>yarn.nodemanager.log-aggregation.num-log-files-per-app</name>
<value>336</value>
</property>

<property>
<name>yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds</name>
<value>3600</value>
</property>

<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/hadoop/yarn/log,/data/hadoop/yarn/log</value>
</property>

<property>
<name>yarn.nodemanager.log.retain-seconds</name>
<value>604800</value>
</property>

<property>
<name>yarn.nodemanager.recovery.dir</name>
<value>/var/log/hadoop-yarn/nodemanager/recovery-state</value>
</property>

<property>
<name>yarn.nodemanager.recovery.enabled</name>
<value>true</value>
</property>

<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/app-logs</value>
</property>

<property>
<name>yarn.nodemanager.remote-app-log-dir-suffix</name>
<value>logs</value>
</property>

<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>3</value>
</property>

<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>4096</value>
</property>

<property>
<name>yarn.nodemanager.resource.percentage-physical-cpu-limit</name>
<value>80</value>
</property>

<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>

<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>

<property>
<name>yarn.resourcemanager.address</name>
<value>hdn04.abc.com:8050</value>
</property>

<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hdn04.abc.com:8141</value>
</property>

<property>
<name>yarn.resourcemanager.am.max-attempts</name>
<value>2</value>
</property>

<property>
<name>yarn.resourcemanager.bind-host</name>
<value>0.0.0.0</value>
</property>

<property>
<name>yarn.resourcemanager.connect.max-wait.ms</name>
<value>-1</value>
</property>

<property>
<name>yarn.resourcemanager.connect.retry-interval.ms</name>
<value>15000</value>
</property>

<property>
<name>yarn.resourcemanager.fs.state-store.retry-policy-spec</name>
<value>2000, 500</value>
</property>

<property>
<name>yarn.resourcemanager.fs.state-store.uri</name>
<value> </value>
</property>

<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>false</value>
</property>

<property>
<name>yarn.resourcemanager.hostname</name>
<value>hdn04.abc.com</value>
</property>

<property>
<name>yarn.resourcemanager.monitor.capacity.preemption.natural_termination_factor</name>
<value>1</value>
</property>

<property>
<name>yarn.resourcemanager.monitor.capacity.preemption.total_preemption_per_round</name>
<value>0.13</value>
</property>

<property>
<name>yarn.resourcemanager.nodes.exclude-path</name>
<value>/etc/hadoop/conf/yarn.exclude</value>
</property>

<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>

<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hdn04.abc.com:8025</value>
</property>

<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hdn04.abc.com:8030</value>
</property>

<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>

<property>
<name>yarn.resourcemanager.scheduler.monitor.enable</name>
<value>false</value>
</property>

<property>
<name>yarn.resourcemanager.state-store.max-completed-applications</name>
<value>${yarn.resourcemanager.max-completed-applications}</value>
</property>

<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>

<property>
<name>yarn.resourcemanager.system-metrics-publisher.dispatcher.pool-size</name>
<value>10</value>
</property>

<property>
<name>yarn.resourcemanager.system-metrics-publisher.enabled</name>
<value>true</value>
</property>

<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hdn04.abc.com:8088</value>
</property>

<property>
<name>yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled</name>
<value>false</value>
</property>

<property>
<name>yarn.resourcemanager.webapp.https.address</name>
<value>hdn04.abc.com:8090</value>
</property>

<property>
<name>yarn.resourcemanager.work-preserving-recovery.enabled</name>
<value>true</value>
</property>

<property>
<name>yarn.resourcemanager.work-preserving-recovery.scheduling-wait-ms</name>
<value>10000</value>
</property>

<property>
<name>yarn.resourcemanager.zk-acl</name>
<value>world:anyone:rwcda</value>
</property>

<property>
<name>yarn.resourcemanager.zk-address</name>
<value>hdn03.abc.com:2181,hdn04.abc.com:2181,hdn05.abc.com:2181,hdn02.abc.com:2181</value>
</property>

<property>
<name>yarn.resourcemanager.zk-num-retries</name>
<value>1000</value>
</property>

<property>
<name>yarn.resourcemanager.zk-retry-interval-ms</name>
<value>1000</value>
</property>

<property>
<name>yarn.resourcemanager.zk-state-store.parent-path</name>
<value>/rmstore</value>
</property>

<property>
<name>yarn.resourcemanager.zk-timeout-ms</name>
<value>10000</value>
</property>

<property>
<name>yarn.scheduler.capacity.ordering-policy.priority-utilization.underutilized-preemption.enabled</name>
<value>false</value>
</property>

<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>4096</value>
</property>

<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>3</value>
</property>

<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
</property>

<property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>1</value>
</property>

<property>
<name>yarn.timeline-service.address</name>
<value>hdn04.abc.com:10200</value>
</property>

<property>
<name>yarn.timeline-service.bind-host</name>
<value>0.0.0.0</value>
</property>

<property>
<name>yarn.timeline-service.client.fd-flush-interval-secs</name>
<value>5</value>
</property>

<property>
<name>yarn.timeline-service.client.max-retries</name>
<value>30</value>
</property>

<property>
<name>yarn.timeline-service.client.retry-interval-ms</name>
<value>1000</value>
</property>

<property>
<name>yarn.timeline-service.enabled</name>
<value>true</value>
</property>

<property>
<name>yarn.timeline-service.entity-group-fs-store.active-dir</name>
<value>/ats/active/</value>
</property>

<property>
<name>yarn.timeline-service.entity-group-fs-store.app-cache-size</name>
<value>10</value>
</property>

<property>
<name>yarn.timeline-service.entity-group-fs-store.cleaner-interval-seconds</name>
<value>3600</value>
</property>

<property>
<name>yarn.timeline-service.entity-group-fs-store.done-dir</name>
<value>/ats/done/</value>
</property>

<property>
<name>yarn.timeline-service.entity-group-fs-store.group-id-plugin-classes</name>
<value>org.apache.tez.dag.history.logging.ats.TimelineCachePluginImpl</value>
</property>

<property>
<name>yarn.timeline-service.entity-group-fs-store.group-id-plugin-classpath</name>
<value>/usr/hdp/2.6.4.0-91/spark/hdpLib/*</value>
</property>

<property>
<name>yarn.timeline-service.entity-group-fs-store.retain-seconds</name>
<value>604800</value>
</property>

<property>
<name>yarn.timeline-service.entity-group-fs-store.scan-interval-seconds</name>
<value>15</value>
</property>

<property>
<name>yarn.timeline-service.entity-group-fs-store.summary-store</name>
<value>org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore</value>
</property>

<property>
<name>yarn.timeline-service.generic-application-history.store-class</name>
<value>org.apache.hadoop.yarn.server.applicationhistoryservice.NullApplicationHistoryStore</value>
</property>

<property>
<name>yarn.timeline-service.http-authentication.proxyuser.root.groups</name>
<value>*</value>
</property>

<property>
<name>yarn.timeline-service.http-authentication.proxyuser.root.hosts</name>
<value>chmgr.abc.com</value>
</property>

<property>
<name>yarn.timeline-service.http-authentication.simple.anonymous.allowed</name>
<value>true</value>
</property>

<property>
<name>yarn.timeline-service.http-authentication.type</name>
<value>simple</value>
</property>

<property>
<name>yarn.timeline-service.leveldb-state-store.path</name>
<value>/hadoop/yarn/timeline</value>
</property>

<property>
<name>yarn.timeline-service.leveldb-timeline-store.path</name>
<value>/hadoop/yarn/timeline</value>
</property>

<property>
<name>yarn.timeline-service.leveldb-timeline-store.read-cache-size</name>
<value>104857600</value>
</property>

<property>
<name>yarn.timeline-service.leveldb-timeline-store.start-time-read-cache-size</name>
<value>10000</value>
</property>

<property>
<name>yarn.timeline-service.leveldb-timeline-store.start-time-write-cache-size</name>
<value>10000</value>
</property>

<property>
<name>yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms</name>
<value>300000</value>
</property>

<property>
<name>yarn.timeline-service.recovery.enabled</name>
<value>true</value>
</property>

<property>
<name>yarn.timeline-service.state-store-class</name>
<value>org.apache.hadoop.yarn.server.timeline.recovery.LeveldbTimelineStateStore</value>
</property>

<property>
<name>yarn.timeline-service.store-class</name>
<value>org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore</value>
</property>

<property>
<name>yarn.timeline-service.ttl-enable</name>
<value>true</value>
</property>

<property>
<name>yarn.timeline-service.ttl-ms</name>
<value>2678400000</value>
</property>

<property>
<name>yarn.timeline-service.version</name>
<value>1.5</value>
</property>

<property>
<name>yarn.timeline-service.webapp.address</name>
<value>hdn04.abc.com:8188</value>
</property>

<property>
<name>yarn.timeline-service.webapp.https.address</name>
<value>hdn04.abc.com:8190</value>
</property>

</configuration>


and job scheduler in yarn is default capacity scheduler. am I using the right port for yarn ? unable to see what is going wrong. kindly suggest by looking at my yarn.site.xml .I can provide other config details if necessary .

Re: oozie job stuck in - 'Running ' state always

New Contributor

Hi i found myself answer that i'm giving wrong port to job tracker

instead of below address

<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hdn04.abc.com:8030</value>
</property>


we need to give


<property>
<name>yarn.resourcemanager.address</name>
<value>hdn04.abc.com:8050</value>
</property> As JOB tracker

Re: oozie job stuck in - 'Running ' state always

New Contributor

I have a similar issue. I tried using port 8032 and 8050 for jobtracker. Didn't work.

Anyone have this issue resolved ? How did you resolve it ?

YARN keeps stating: ACCEPTED: waiting for AM container to be allocated, launched and register with RM.

Diagnostics:[Sat Sep 15 02:49:30 +0000 2018] Application is Activated, waiting for resources to be assigned for AM. Details : AM Partition = <DEFAULT_PARTITION> ; Partition Resource = <memory:3000, vCores:8> ; Queue's Absolute capacity = 14.0 % ; Queue's Absolute used capacity = 0.0 % ; Queue's Absolute max capacity = 14.0 % ;

Job always in "RUNNING" state and never finish.


yarn.jpgoozie-job-config.jpg