Support Questions

Find answers, ask questions, and share your expertise

oozie job stuck in - 'Running ' state always

avatar
Explorer

I followed the oozie official document to run examples they have provided. i able to see the status of the oozie server state as 'Normal'. i tried running the oozie map-reduce example job by modifying name node and job tracker address like below

job is getting created, but when i check the -info and log of the job. oozie job status always in Running mode.i'm using

Hadoop 2.7.3.2.6.4.0-91 which is installed through hortonworks ambari and oozie version is 4.2.0.could not identify what exactly going wrong ,please suggest the way.

job.property.xml

nameNode=hdfs://hdn01.abc.com:8020
jobTracker=hdn04.abc.com:8021
queueName=default
examplesRoot=oozie_test/examples
oozie.wf.application.path=${nameNode}/user/${examplesRoot}/apps/map-reduce/workflow.xml
outputDir=map-reduce

JOB info:
 oozie job -oozie http://hdn03.abc.com:11000/oozie -info 0000002-180416122730802-oozie-oozi-W
Job ID : 0000002-180416122730802-oozie-oozi-W
------------------------------------------------------------------------------------------------------------------------------------
Workflow Name : map-reduce-wf
App Path      : hdfs://hdn01.abc.com:8020/user/oozie_test/examples/apps/map-reduce/workflow.xml
Status        : RUNNING
Run           : 0
User          : root
Group         : -
Created       : 2018-04-16 08:39 GMT
Started       : 2018-04-16 08:39 GMT
Last Modified : 2018-04-16 08:39 GMT
Ended         : -
CoordAction ID: -


Actions
------------------------------------------------------------------------------------------------------------------------------------
ID                                                                            Status    Ext ID                 Ext Status Err Code
------------------------------------------------------------------------------------------------------------------------------------
0000002-180416122730802-oozie-oozi-W@:start:                                  OK        -                      OK         -  
------------------------------------------------------------------------------------------------------------------------------------
0000002-180416122730802-oozie-oozi-W@mr-node                                  PREP      -                      -          -  
------------------------------------------------------------------------------------------------------------------------------------



LOG:
oozie job -oozie http://hdn03.abc.com:11000/oozie -log 0000002-180416122730802-oozie-oozi-W
2018-04-16 14:09:02,222  INFO ActionStartXCommand:520 - SERVER[hdn03.abc.com] USER[root] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000002-180416122730802-oozie-oozi-W] ACTION[0000002-180416122730802-oozie-oozi-W@:start:] Start action [0000002-180416122730802-oozie-oozi-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2018-04-16 14:09:02,223  INFO ActionStartXCommand:520 - SERVER[hdn03.abc.com] USER[root] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000002-180416122730802-oozie-oozi-W] ACTION[0000002-180416122730802-oozie-oozi-W@:start:] [***0000002-180416122730802-oozie-oozi-W@:start:***]Action status=DONE
2018-04-16 14:09:02,223  INFO ActionStartXCommand:520 - SERVER[hdn03.abc.com] USER[root] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000002-180416122730802-oozie-oozi-W] ACTION[0000002-180416122730802-oozie-oozi-W@:start:] [***0000002-180416122730802-oozie-oozi-W@:start:***]Action updated in DB!
2018-04-16 14:09:02,311  INFO WorkflowNotificationXCommand:520 - SERVER[hdn03.abc.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000002-180416122730802-oozie-oozi-W] ACTION[0000002-180416122730802-oozie-oozi-W@:start:] No Notification URL is defined. Therefore nothing to notify for job 0000002-180416122730802-oozie-oozi-W@:start:
2018-04-16 14:09:02,312  INFO WorkflowNotificationXCommand:520 - SERVER[hdn03.abc.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000002-180416122730802-oozie-oozi-W] ACTION[] No Notification URL is defined. Therefore nothing to notify for job 0000002-180416122730802-oozie-oozi-W
2018-04-16 14:09:02,335  INFO ActionStartXCommand:520 - SERVER[hdn03.abc.com] USER[root] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000002-180416122730802-oozie-oozi-W] ACTION[0000002-180416122730802-oozie-oozi-W@mr-node] Start action [0000002-180416122730802-oozie-oozi-W@mr-node] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
root@chmgs:/# oozie job -oozie http://hdn03.abc.com:11000/oozie -log 0000002-180416122730802-oozie-oozi-W
2018-04-16 14:09:02,222  INFO ActionStartXCommand:520 - SERVER[hdn03.abc.com] USER[root] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000002-180416122730802-oozie-oozi-W] ACTION[0000002-180416122730802-oozie-oozi-W@:start:] Start action [0000002-180416122730802-oozie-oozi-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2018-04-16 14:09:02,223  INFO ActionStartXCommand:520 - SERVER[hdn03.abc.com] USER[root] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000002-180416122730802-oozie-oozi-W] ACTION[0000002-180416122730802-oozie-oozi-W@:start:] [***0000002-180416122730802-oozie-oozi-W@:start:***]Action status=DONE
2018-04-16 14:09:02,223  INFO ActionStartXCommand:520 - SERVER[hdn03.abc.com] USER[root] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000002-180416122730802-oozie-oozi-W] ACTION[0000002-180416122730802-oozie-oozi-W@:start:] [***0000002-180416122730802-oozie-oozi-W@:start:***]Action updated in DB!
2018-04-16 14:09:02,311  INFO WorkflowNotificationXCommand:520 - SERVER[hdn03.abc.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000002-180416122730802-oozie-oozi-W] ACTION[0000002-180416122730802-oozie-oozi-W@:start:] No Notification URL is defined. Therefore nothing to notify for job 0000002-180416122730802-oozie-oozi-W@:start:
2018-04-16 14:09:02,312  INFO WorkflowNotificationXCommand:520 - SERVER[hdn03.abc.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000002-180416122730802-oozie-oozi-W] ACTION[] No Notification URL is defined. Therefore nothing to notify for job 0000002-180416122730802-oozie-oozi-W
2018-04-16 14:09:02,335  INFO ActionStartXCommand:520 - SERVER[hdn03.abc.com] USER[root] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000002-180416122730802-oozie-oozi-W] ACTION[0000002-180416122730802-oozie-oozi-W@mr-node] Start action [0000002-180416122730802-oozie-oozi-W@mr-node] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]







1 ACCEPTED SOLUTION

avatar
Explorer

Hi i found myself answer that i'm giving wrong port to job tracker

instead of below address

<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hdn04.abc.com:8030</value>
</property>


we need to give


<property>
<name>yarn.resourcemanager.address</name>
<value>hdn04.abc.com:8050</value>
</property> As JOB tracker

View solution in original post

7 REPLIES 7

avatar
Master Guru
@Pandu123 P

Please change jobTracker=hdn04.abc.com:8021 to jobTracker=hdn04.abc.com:8032 and try again.

avatar
Explorer

@Kuldeep Kulkarni thank u for response, I changed the job tracker value as you suggested but still no luck ..job status in forever 'Running' mode

nameNode=hdfs://hdn01.abc.com:8020
jobTracker=hdn04.abc.com:8032
queueName=default
examplesRoot=oozie_test/examplesoozie.wf.application.path=${nameNode}/user/${examplesRoot}/apps/map-reduce/workflow.xml

outputDir=map-reduce
job log :
$ oozie job -oozie http://hdn03.abc.com:11000/oozie -log 0000001-180416171621790-oozie-oozi-W2018-04-17 10:13:54,369  INFO ActionStartXCommand:520 - SERVER[hdn03.abc.com] USER[hdfs] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000001-180416171621790-oozie-oozi-W] ACTION[0000001-180416171621790-oozie-oozi-W@:start:] Start action [0000001-180416171621790-oozie-oozi-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]2018-04-17 10:13:54,370  INFO ActionStartXCommand:520 - SERVER[hdn03.abc.com] USER[hdfs] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000001-180416171621790-oozie-oozi-W] ACTION[0000001-180416171621790-oozie-oozi-W@:start:] [***0000001-180416171621790-oozie-oozi-W@:start:***]Action status=DONE2018-04-17 10:13:54,370  INFO ActionStartXCommand:520 - SERVER[hdn03.abc.com] USER[hdfs] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000001-180416171621790-oozie-oozi-W] ACTION[0000001-180416171621790-oozie-oozi-W@:start:] [***0000001-180416171621790-oozie-oozi-W@:start:***]Action updated in DB!2018-04-17 10:13:54,424  INFO WorkflowNotificationXCommand:520 - SERVER[hdn03.abc.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-180416171621790-oozie-oozi-W] ACTION[0000001-180416171621790-oozie-oozi-W@:start:] No Notification URL is defined. Therefore nothing to notify for job 0000001-180416171621790-oozie-oozi-W@:start:2018-04-17 10:13:54,428  INFO WorkflowNotificationXCommand:520 - SERVER[hdn03.abc.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-180416171621790-oozie-oozi-W] ACTION[] No Notification URL is defined. Therefore nothing to notify for job 0000001-180416171621790-oozie-oozi-W2018-04-17 10:13:54,447  INFO ActionStartXCommand:520 - SERVER[hdn03.abc.com] USER[hdfs] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000001-180416171621790-oozie-oozi-W] ACTION[0000001-180416171621790-oozie-oozi-W@mr-node] Start action [0000001-180416171621790-oozie-oozi-W@mr-node] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]


my yarn.site.xml in Hadoop

<configuration>

<property>
<name>hadoop.registry.rm.enabled</name>
<value>true</value>
</property>

<property>
<name>hadoop.registry.zk.quorum</name>
<value>hdn03.abc.com:2181,hdn04.abc.com:2181,hdn05.abc.com:2181,hdn02.abc.com:2181</value>
</property>

<property>
<name>manage.include.files</name>
<value>false</value>
</property>

<property>
<name>yarn.acl.enable</name>
<value>false</value>
</property>

<property>
<name>yarn.admin.acl</name>
<value>activity_analyzer,yarn</value>
</property>

<property>
<name>yarn.application.classpath</name>
<value>/usr/hdp/2.6.4.0-91/hadoop/conf,/usr/hdp/2.6.4.0-91/hadoop/*,/usr/hdp/2.6.4.0-91/hadoop/lib/*,/usr/hdp/current/hadoop-hdfs-client/*,/usr/hdp/current/hadoop-hdfs-client/lib/*,/usr/hdp/current/hadoop-yarn-client/*,/usr/hdp/current/hadoop-yarn-client/lib/*,/usr/hdp/current/ext/hadoop/*</value>
</property>

<property>
<name>yarn.client.failover-proxy-provider</name>
<value>org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider</value>
</property>

<property>
<name>yarn.client.nodemanager-connect.max-wait-ms</name>
<value>60000</value>
</property>

<property>
<name>yarn.client.nodemanager-connect.retry-interval-ms</name>
<value>10000</value>
</property>

<property>
<name>yarn.http.policy</name>
<value>HTTP_ONLY</value>
</property>

<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>

<property>
<name>yarn.log-aggregation.file-controller.IndexedFormat.class</name>
<value>org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController</value>
</property>

<property>
<name>yarn.log-aggregation.file-controller.TFile.class</name>
<value>org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.LogAggregationTFileController</value>
</property>

<property>
<name>yarn.log-aggregation.file-formats</name>
<value>IndexedFormat,TFile</value>
</property>

<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>2592000</value>
</property>

<property>
<name>yarn.log.server.url</name>
<value>http://hdn04.abc.com:19888/jobhistory/logs</value>
</property>

<property>
<name>yarn.log.server.web-service.url</name>
<value>http://hdn04.abc.com:8188/ws/v1/applicationhistory</value>
</property>

<property>
<name>yarn.node-labels.enabled</name>
<value>false</value>
</property>

<property>
<name>yarn.node-labels.fs-store.retry-policy-spec</name>
<value>2000, 500</value>
</property>

<property>
<name>yarn.node-labels.fs-store.root-dir</name>
<value>/system/yarn/node-labels</value>
</property>

<property>
<name>yarn.nodemanager.address</name>
<value>0.0.0.0:45454</value>
</property>

<property>
<name>yarn.nodemanager.admin-env</name>
<value>MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX</value>
</property>

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle,spark_shuffle,spark2_shuffle</value>
</property>

<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

<property>
<name>yarn.nodemanager.aux-services.spark2_shuffle.class</name>
<value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property>

<property>
<name>yarn.nodemanager.aux-services.spark2_shuffle.classpath</name>
<value>/usr/hdp/${hdp.version}/spark2/aux/*</value>
</property>

<property>
<name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
<value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property>

<property>
<name>yarn.nodemanager.aux-services.spark_shuffle.classpath</name>
<value>/usr/hdp/${hdp.version}/spark/aux/*</value>
</property>

<property>
<name>yarn.nodemanager.bind-host</name>
<value>0.0.0.0</value>
</property>

<property>
<name>yarn.nodemanager.container-executor.class</name>
<value>org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor</value>
</property>

<property>
<name>yarn.nodemanager.container-metrics.unregister-delay-ms</name>
<value>60000</value>
</property>

<property>
<name>yarn.nodemanager.container-monitor.interval-ms</name>
<value>3000</value>
</property>

<property>
<name>yarn.nodemanager.delete.debug-delay-sec</name>
<value>0</value>
</property>

<property>
<name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
<value>90</value>
</property>

<property>
<name>yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb</name>
<value>1000</value>
</property>

<property>
<name>yarn.nodemanager.disk-health-checker.min-healthy-disks</name>
<value>0.25</value>
</property>

<property>
<name>yarn.nodemanager.health-checker.interval-ms</name>
<value>135000</value>
</property>

<property>
<name>yarn.nodemanager.health-checker.script.timeout-ms</name>
<value>60000</value>
</property>

<property>
<name>yarn.nodemanager.kill-escape.launch-command-line</name>
<value>slider-agent,LLAP</value>
</property>

<property>
<name>yarn.nodemanager.kill-escape.user</name>
<value>hive</value>
</property>

<property>
<name>yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage</name>
<value>false</value>
</property>

<property>
<name>yarn.nodemanager.linux-container-executor.group</name>
<value>hadoop</value>
</property>

<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/hadoop/yarn/local,/data/hadoop/yarn/local</value>
</property>

<property>
<name>yarn.nodemanager.log-aggregation.compression-type</name>
<value>gz</value>
</property>

<property>
<name>yarn.nodemanager.log-aggregation.debug-enabled</name>
<value>false</value>
</property>

<property>
<name>yarn.nodemanager.log-aggregation.num-log-files-per-app</name>
<value>336</value>
</property>

<property>
<name>yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds</name>
<value>3600</value>
</property>

<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/hadoop/yarn/log,/data/hadoop/yarn/log</value>
</property>

<property>
<name>yarn.nodemanager.log.retain-seconds</name>
<value>604800</value>
</property>

<property>
<name>yarn.nodemanager.recovery.dir</name>
<value>/var/log/hadoop-yarn/nodemanager/recovery-state</value>
</property>

<property>
<name>yarn.nodemanager.recovery.enabled</name>
<value>true</value>
</property>

<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/app-logs</value>
</property>

<property>
<name>yarn.nodemanager.remote-app-log-dir-suffix</name>
<value>logs</value>
</property>

<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>3</value>
</property>

<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>4096</value>
</property>

<property>
<name>yarn.nodemanager.resource.percentage-physical-cpu-limit</name>
<value>80</value>
</property>

<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>

<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>

<property>
<name>yarn.resourcemanager.address</name>
<value>hdn04.abc.com:8050</value>
</property>

<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hdn04.abc.com:8141</value>
</property>

<property>
<name>yarn.resourcemanager.am.max-attempts</name>
<value>2</value>
</property>

<property>
<name>yarn.resourcemanager.bind-host</name>
<value>0.0.0.0</value>
</property>

<property>
<name>yarn.resourcemanager.connect.max-wait.ms</name>
<value>-1</value>
</property>

<property>
<name>yarn.resourcemanager.connect.retry-interval.ms</name>
<value>15000</value>
</property>

<property>
<name>yarn.resourcemanager.fs.state-store.retry-policy-spec</name>
<value>2000, 500</value>
</property>

<property>
<name>yarn.resourcemanager.fs.state-store.uri</name>
<value> </value>
</property>

<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>false</value>
</property>

<property>
<name>yarn.resourcemanager.hostname</name>
<value>hdn04.abc.com</value>
</property>

<property>
<name>yarn.resourcemanager.monitor.capacity.preemption.natural_termination_factor</name>
<value>1</value>
</property>

<property>
<name>yarn.resourcemanager.monitor.capacity.preemption.total_preemption_per_round</name>
<value>0.13</value>
</property>

<property>
<name>yarn.resourcemanager.nodes.exclude-path</name>
<value>/etc/hadoop/conf/yarn.exclude</value>
</property>

<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>

<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hdn04.abc.com:8025</value>
</property>

<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hdn04.abc.com:8030</value>
</property>

<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>

<property>
<name>yarn.resourcemanager.scheduler.monitor.enable</name>
<value>false</value>
</property>

<property>
<name>yarn.resourcemanager.state-store.max-completed-applications</name>
<value>${yarn.resourcemanager.max-completed-applications}</value>
</property>

<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>

<property>
<name>yarn.resourcemanager.system-metrics-publisher.dispatcher.pool-size</name>
<value>10</value>
</property>

<property>
<name>yarn.resourcemanager.system-metrics-publisher.enabled</name>
<value>true</value>
</property>

<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hdn04.abc.com:8088</value>
</property>

<property>
<name>yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled</name>
<value>false</value>
</property>

<property>
<name>yarn.resourcemanager.webapp.https.address</name>
<value>hdn04.abc.com:8090</value>
</property>

<property>
<name>yarn.resourcemanager.work-preserving-recovery.enabled</name>
<value>true</value>
</property>

<property>
<name>yarn.resourcemanager.work-preserving-recovery.scheduling-wait-ms</name>
<value>10000</value>
</property>

<property>
<name>yarn.resourcemanager.zk-acl</name>
<value>world:anyone:rwcda</value>
</property>

<property>
<name>yarn.resourcemanager.zk-address</name>
<value>hdn03.abc.com:2181,hdn04.abc.com:2181,hdn05.abc.com:2181,hdn02.abc.com:2181</value>
</property>

<property>
<name>yarn.resourcemanager.zk-num-retries</name>
<value>1000</value>
</property>

<property>
<name>yarn.resourcemanager.zk-retry-interval-ms</name>
<value>1000</value>
</property>

<property>
<name>yarn.resourcemanager.zk-state-store.parent-path</name>
<value>/rmstore</value>
</property>

<property>
<name>yarn.resourcemanager.zk-timeout-ms</name>
<value>10000</value>
</property>

<property>
<name>yarn.scheduler.capacity.ordering-policy.priority-utilization.underutilized-preemption.enabled</name>
<value>false</value>
</property>

<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>4096</value>
</property>

<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>3</value>
</property>

<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
</property>

<property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>1</value>
</property>

<property>
<name>yarn.timeline-service.address</name>
<value>hdn04.abc.com:10200</value>
</property>

<property>
<name>yarn.timeline-service.bind-host</name>
<value>0.0.0.0</value>
</property>

<property>
<name>yarn.timeline-service.client.fd-flush-interval-secs</name>
<value>5</value>
</property>

<property>
<name>yarn.timeline-service.client.max-retries</name>
<value>30</value>
</property>

<property>
<name>yarn.timeline-service.client.retry-interval-ms</name>
<value>1000</value>
</property>

<property>
<name>yarn.timeline-service.enabled</name>
<value>true</value>
</property>

<property>
<name>yarn.timeline-service.entity-group-fs-store.active-dir</name>
<value>/ats/active/</value>
</property>

<property>
<name>yarn.timeline-service.entity-group-fs-store.app-cache-size</name>
<value>10</value>
</property>

<property>
<name>yarn.timeline-service.entity-group-fs-store.cleaner-interval-seconds</name>
<value>3600</value>
</property>

<property>
<name>yarn.timeline-service.entity-group-fs-store.done-dir</name>
<value>/ats/done/</value>
</property>

<property>
<name>yarn.timeline-service.entity-group-fs-store.group-id-plugin-classes</name>
<value>org.apache.tez.dag.history.logging.ats.TimelineCachePluginImpl</value>
</property>

<property>
<name>yarn.timeline-service.entity-group-fs-store.group-id-plugin-classpath</name>
<value>/usr/hdp/2.6.4.0-91/spark/hdpLib/*</value>
</property>

<property>
<name>yarn.timeline-service.entity-group-fs-store.retain-seconds</name>
<value>604800</value>
</property>

<property>
<name>yarn.timeline-service.entity-group-fs-store.scan-interval-seconds</name>
<value>15</value>
</property>

<property>
<name>yarn.timeline-service.entity-group-fs-store.summary-store</name>
<value>org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore</value>
</property>

<property>
<name>yarn.timeline-service.generic-application-history.store-class</name>
<value>org.apache.hadoop.yarn.server.applicationhistoryservice.NullApplicationHistoryStore</value>
</property>

<property>
<name>yarn.timeline-service.http-authentication.proxyuser.root.groups</name>
<value>*</value>
</property>

<property>
<name>yarn.timeline-service.http-authentication.proxyuser.root.hosts</name>
<value>chmgr.abc.com</value>
</property>

<property>
<name>yarn.timeline-service.http-authentication.simple.anonymous.allowed</name>
<value>true</value>
</property>

<property>
<name>yarn.timeline-service.http-authentication.type</name>
<value>simple</value>
</property>

<property>
<name>yarn.timeline-service.leveldb-state-store.path</name>
<value>/hadoop/yarn/timeline</value>
</property>

<property>
<name>yarn.timeline-service.leveldb-timeline-store.path</name>
<value>/hadoop/yarn/timeline</value>
</property>

<property>
<name>yarn.timeline-service.leveldb-timeline-store.read-cache-size</name>
<value>104857600</value>
</property>

<property>
<name>yarn.timeline-service.leveldb-timeline-store.start-time-read-cache-size</name>
<value>10000</value>
</property>

<property>
<name>yarn.timeline-service.leveldb-timeline-store.start-time-write-cache-size</name>
<value>10000</value>
</property>

<property>
<name>yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms</name>
<value>300000</value>
</property>

<property>
<name>yarn.timeline-service.recovery.enabled</name>
<value>true</value>
</property>

<property>
<name>yarn.timeline-service.state-store-class</name>
<value>org.apache.hadoop.yarn.server.timeline.recovery.LeveldbTimelineStateStore</value>
</property>

<property>
<name>yarn.timeline-service.store-class</name>
<value>org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore</value>
</property>

<property>
<name>yarn.timeline-service.ttl-enable</name>
<value>true</value>
</property>

<property>
<name>yarn.timeline-service.ttl-ms</name>
<value>2678400000</value>
</property>

<property>
<name>yarn.timeline-service.version</name>
<value>1.5</value>
</property>

<property>
<name>yarn.timeline-service.webapp.address</name>
<value>hdn04.abc.com:8188</value>
</property>

<property>
<name>yarn.timeline-service.webapp.https.address</name>
<value>hdn04.abc.com:8190</value>
</property>

</configuration>


and job scheduler in yarn is default capacity scheduler. am I using the right port for yarn ? unable to see what is going wrong. kindly suggest by looking at my yarn.site.xml .I can provide other config details if necessary .

avatar
Explorer

Hi i found myself answer that i'm giving wrong port to job tracker

instead of below address

<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hdn04.abc.com:8030</value>
</property>


we need to give


<property>
<name>yarn.resourcemanager.address</name>
<value>hdn04.abc.com:8050</value>
</property> As JOB tracker

avatar
Explorer

Hi,

I'm already using port 8050 for yarn resource manager. Still I have jobs stuck in running state. Can you please help out.

 

Thanks,

Gazal

avatar
Explorer

I have a similar issue. I tried using port 8032 and 8050 for jobtracker. Didn't work.

Anyone have this issue resolved ? How did you resolve it ?

YARN keeps stating: ACCEPTED: waiting for AM container to be allocated, launched and register with RM.

Diagnostics:[Sat Sep 15 02:49:30 +0000 2018] Application is Activated, waiting for resources to be assigned for AM. Details : AM Partition = <DEFAULT_PARTITION> ; Partition Resource = <memory:3000, vCores:8> ; Queue's Absolute capacity = 14.0 % ; Queue's Absolute used capacity = 0.0 % ; Queue's Absolute max capacity = 14.0 % ;

Job always in "RUNNING" state and never finish.


yarn.jpgoozie-job-config.jpg

avatar
New Contributor

Hi Ramesh,

Did your issue got resolved, I'm getting the same issue when trying to execute oozie flow, it is in a running state always and I tried changing the port numbers, but the job not getting succeded also I was not able to check the job logs for the reason. Can you please help?

avatar
New Contributor

I use OOZIE to run a workflow. But a simple official example shell-wf (echo hello oozie) stuck in RUNNING state and never end. The workflow can be submitted but stuck at RUNNING state. There is not any error in job log in OOZIE UI.

When submitting a shell with spark-submit inside, the job will be never submitted and can not be seen in Spark UI. I suspect the shell didn't run at all. customer survey