Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

YARN cluster stalls

Highlighted

YARN cluster stalls

New Contributor

Hi, 

 

We have 10 Node cluster with memory and CPU evenly distributed.

MEMORY : YARN 34GB , IMPALA 30GB

CPU : YARN 18cores, IMPALA 2cores

 

We have hourly oozie batch processing workflows with hive and impala (shell) actions. These workflows run in parallel. The jobs are small and consume less resources. Also we use Dominant Resource Fairness (DRF) scheduling. 

 

We are experiencing whole cluster stalls once or twice a day. All the job gets hanged during the stall though the resources used are not maximum. There is more memory and CPU left on the cluster during the time of this stall.

After we kill one of the jobs manually then the other workflows start to move. 

 

We configured the whole cluster as recommonded here 

http://www.cloudera.com/documentation/enterprise/5-5-x/topics/cdh_ig_yarn_tuning.html

We are seeing this issue from cloudera 4.7.x version. Since then we upgraded many versions and right now we are using CDH 5.5.2. 

 

Not understanding why there would be a stall though the resources are still available to allocate.

 

Pls help me in understanding the reason possible for stalls.

 

Thanks

 

3 REPLIES 3

Re: YARN cluster stalls

Rising Star

Can you upload the log of one of the jobs that are stalled? 

Highlighted

Re: YARN cluster stalls

New Contributor

2016-06-28 13:07:18,891 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for application appattempt_1466792160799_175423_000001
2016-06-28 13:07:19,316 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Executing with tokens:
2016-06-28 13:07:19,316 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind: YARN_AM_RM_TOKEN, Service: , Ident: (org.apache.hadoop.yarn.security.AMRMTokenIdentifier@4e7edcb7)
2016-06-28 13:07:19,345 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind: RM_DELEGATION_TOKEN, Service: 10.4.2.102:8032, Ident: (owner=cdhadmin, renewer=oozie mr token, realUser=oozie, issueDate=1467119237375, maxDate=1467724037375, sequenceNumber=278734, masterKeyId=5)
2016-06-28 13:07:19,966 WARN [main] org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2016-06-28 13:07:20,093 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter set in config null
2016-06-28 13:07:20,095 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
2016-06-28 13:07:20,141 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.jobhistory.EventType for class org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler
2016-06-28 13:07:20,142 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.JobEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher
2016-06-28 13:07:20,143 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.TaskEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher
2016-06-28 13:07:20,144 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.TaskAttemptEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher
2016-06-28 13:07:20,144 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventType for class org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler
2016-06-28 13:07:20,145 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.speculate.Speculator$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$SpeculatorEventDispatcher
2016-06-28 13:07:20,146 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.rm.ContainerAllocator$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter
2016-06-28 13:07:20,147 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncher$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerLauncherRouter
2016-06-28 13:07:20,190 INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system [hdfs://node-address:8020]
2016-06-28 13:07:20,213 INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system [hdfs://node-address:8020]
2016-06-28 13:07:20,260 INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system [hdfs://node-address:8020]
2016-06-28 13:07:20,271 INFO [main] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Emitting job history data to the timeline server is not enabled
2016-06-28 13:07:20,315 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.JobFinishEvent$Type for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler
2016-06-28 13:07:20,534 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2016-06-28 13:07:20,593 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2016-06-28 13:07:20,594 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MRAppMaster metrics system started
2016-06-28 13:07:20,605 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Adding job token for job_1466792160799_175423 to jobTokenSecretManager
2016-06-28 13:07:20,720 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Not uberizing job_1466792160799_175423 because: not enabled; too much RAM;
2016-06-28 13:07:20,744 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Input size for job job_1466792160799_175423 = 0. Number of splits = 1
2016-06-28 13:07:20,744 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Number of reduces for job job_1466792160799_175423 = 0
2016-06-28 13:07:20,744 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1466792160799_175423Job Transitioned from NEW to INITED
2016-06-28 13:07:20,746 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster launching normal, non-uberized, multi-container job job_1466792160799_175423.
2016-06-28 13:07:20,776 INFO [main] org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue
2016-06-28 13:07:20,785 INFO [Socket Reader #1 for port 45791] org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 45791
2016-06-28 13:07:20,807 INFO [main] org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.mapreduce.v2.api.MRClientProtocolPB to the server
2016-06-28 13:07:20,807 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2016-06-28 13:07:20,807 INFO [IPC Server listener on 45791] org.apache.hadoop.ipc.Server: IPC Server listener on 45791: starting
2016-06-28 13:07:20,809 INFO [main] org.apache.hadoop.mapreduce.v2.app.client.MRClientService: Instantiated MRClientService at node-address/10.4.2.110:45791
2016-06-28 13:07:20,876 INFO [main] org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2016-06-28 13:07:20,885 INFO [main] org.apache.hadoop.security.authentication.server.AuthenticationFilter: Unable to initialize FileSignerSecretProvider, falling back to use random secrets.
2016-06-28 13:07:20,890 INFO [main] org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.mapreduce is not defined
2016-06-28 13:07:20,901 INFO [main] org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2016-06-28 13:07:20,907 INFO [main] org.apache.hadoop.http.HttpServer2: Added filter AM_PROXY_FILTER (class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context mapreduce
2016-06-28 13:07:20,907 INFO [main] org.apache.hadoop.http.HttpServer2: Added filter AM_PROXY_FILTER (class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context static
2016-06-28 13:07:20,910 INFO [main] org.apache.hadoop.http.HttpServer2: adding path spec: /mapreduce/*
2016-06-28 13:07:20,910 INFO [main] org.apache.hadoop.http.HttpServer2: adding path spec: /ws/*
2016-06-28 13:07:20,919 INFO [main] org.apache.hadoop.http.HttpServer2: Jetty bound to port 44097
2016-06-28 13:07:20,919 INFO [main] org.mortbay.log: jetty-6.1.26.cloudera.4
2016-06-28 13:07:20,949 INFO [main] org.mortbay.log: Extract jar:file:/vol1/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/jars/hadoop-yarn-common-2.6.0-cdh5.5.1.jar!/webapps/mapreduce to ./tmp/Jetty_0_0_0_0_44097_mapreduce____ipuvmz/webapp
2016-06-28 13:07:21,290 INFO [main] org.mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:44097
2016-06-28 13:07:21,291 INFO [main] org.apache.hadoop.yarn.webapp.WebApps: Web app /mapreduce started at 44097
2016-06-28 13:07:21,598 INFO [main] org.apache.hadoop.yarn.webapp.WebApps: Registered webapp guice modules
2016-06-28 13:07:21,603 INFO [main] org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue
2016-06-28 13:07:21,604 INFO [Socket Reader #1 for port 47500] org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 47500
2016-06-28 13:07:21,608 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2016-06-28 13:07:21,608 INFO [IPC Server listener on 47500] org.apache.hadoop.ipc.Server: IPC Server listener on 47500: starting
2016-06-28 13:07:21,651 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: nodeBlacklistingEnabled:true
2016-06-28 13:07:21,652 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: maxTaskFailuresPerNode is 3
2016-06-28 13:07:21,652 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: blacklistDisablePercent is 33
2016-06-28 13:07:21,705 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at node-address/10.4.2.102:8030
2016-06-28 13:07:21,761 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: maxContainerCapability: <memory:36864, vCores:18>
2016-06-28 13:07:21,761 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: queue: root.cdhadmin
2016-06-28 13:07:21,764 INFO [main] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Upper limit on the thread pool size is 500
2016-06-28 13:07:21,766 INFO [main] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0
2016-06-28 13:07:21,771 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1466792160799_175423Job Transitioned from INITED to SETUP
2016-06-28 13:07:21,773 INFO [CommitterEvent Processor #0] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_SETUP
2016-06-28 13:07:21,776 INFO [CommitterEvent Processor #0] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output Committer Algorithm version is 1
2016-06-28 13:07:21,783 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1466792160799_175423Job Transitioned from SETUP to RUNNING
2016-06-28 13:07:21,809 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1466792160799_175423_m_000000 Task Transitioned from NEW to SCHEDULED
2016-06-28 13:07:21,811 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1466792160799_175423_m_000000_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2016-06-28 13:07:21,811 INFO [Thread-51] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: mapResourceRequest:<memory:4096, vCores:1>
2016-06-28 13:07:21,840 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer setup for JobId: job_1466792160799_175423, File: hdfs://node-address:8020/user/cdhadmin/.staging/job_1466792160799_175423/job_1466792160799_175423_1.jhist
2016-06-28 13:07:22,121 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system [hdfs://node-address:8020]
2016-06-28 13:07:22,763 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:0 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 HostLocal:0 RackLocal:0
2016-06-28 13:07:22,792 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_1466792160799_175423: ask=1 release= 0 newContainers=0 finishedContainers=0 resourcelimit=<memory:291840, vCores:153> knownNMs=10
2016-06-28 13:07:23,801 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 1
2016-06-28 13:07:23,824 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1466792160799_175423_01_000002 to attempt_1466792160799_175423_m_000000_0
2016-06-28 13:07:23,825 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:1 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:1 ContRel:0 HostLocal:0 RackLocal:0
2016-06-28 13:07:23,865 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Job jar is not present. Not adding any jar to the list of resources.
2016-06-28 13:07:23,881 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: The job-conf file on the remote FS is /user/cdhadmin/.staging/job_1466792160799_175423/job.xml
2016-06-28 13:07:24,031 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Adding #1 tokens and #1 secret keys for NM use for launching container
2016-06-28 13:07:24,032 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Size of containertokens_dob is 2
2016-06-28 13:07:24,032 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Putting shuffle token in serviceData
2016-06-28 13:07:24,259 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1466792160799_175423_m_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
2016-06-28 13:07:24,263 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for container container_1466792160799_175423_01_000002 taskAttempt attempt_1466792160799_175423_m_000000_0
2016-06-28 13:07:24,265 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1466792160799_175423_m_000000_0
2016-06-28 13:07:24,266 INFO [ContainerLauncher #0] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: Opening proxy : node-address:8041
2016-06-28 13:07:24,319 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1466792160799_175423_m_000000_0 : 13562
2016-06-28 13:07:24,320 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1466792160799_175423_m_000000_0] using containerId: [container_1466792160799_175423_01_000002 on NM: [node-address:8041]
2016-06-28 13:07:24,323 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1466792160799_175423_m_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
2016-06-28 13:07:24,323 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1466792160799_175423_m_000000 Task Transitioned from SCHEDULED to RUNNING
2016-06-28 13:07:24,827 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_1466792160799_175423: ask=1 release= 0 newContainers=0 finishedContainers=0 resourcelimit=<memory:287744, vCores:152> knownNMs=10
2016-06-28 13:07:25,868 INFO [Socket Reader #1 for port 47500] SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for job_1466792160799_175423 (auth:SIMPLE)
2016-06-28 13:07:25,888 INFO [IPC Server handler 0 on 47500] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID : jvm_1466792160799_175423_m_000002 asked for a task
2016-06-28 13:07:25,889 INFO [IPC Server handler 0 on 47500] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID: jvm_1466792160799_175423_m_000002 given task: attempt_1466792160799_175423_m_000000_0
2016-06-28 13:07:33,137 INFO [IPC Server handler 1 on 47500] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1466792160799_175423_m_000000_0 is : 1.0
2016-06-28 13:08:03,247 INFO [IPC Server handler 14 on 47500] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1466792160799_175423_m_000000_0 is : 1.0
2016-06-28 13:08:33,354 INFO [IPC Server handler 25 on 47500] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1466792160799_175423_m_000000_0 is : 1.0
2016-06-28 13:09:03,453 INFO [IPC Server handler 1 on 47500] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1466792160799_175423_m_000000_0 is : 1.0
2016-06-28 13:09:33,550 INFO [IPC Server handler 14 on 47500] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1466792160799_175423_m_000000_0 is : 1.0
2016-06-28 13:10:00,642 INFO [IPC Server handler 26 on 47500] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1466792160799_175423_m_000000_0 is : 1.0
2016-06-28 13:10:30,736 INFO [IPC Server handler 2 on 47500] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1466792160799_175423_m_000000_0 is : 1.0
2016-06-28 13:11:00,826 INFO [IPC Server handler 21 on 47500] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1466792160799_175423_m_000000_0 is : 1.0
2016-06-28 13:11:30,914 INFO [IPC Server handler 28 on 47500] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1466792160799_175423_m_000000_0 is : 1.0
2016-06-28 13:12:01,003 INFO [IPC Server handler 2 on 47500] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1466792160799_175423_m_000000_0 is : 1.0
2016-06-28 13:12:31,096 INFO [IPC Server handler 14 on 47500] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1466792160799_175423_m_000000_0 is : 1.0
2016-06-28 13:13:01,188 INFO [IPC Server handler 25 on 47500] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1466792160799_175423_m_000000_0 is : 1.0
2016-06-28 13:13:31,277 INFO [IPC Server handler 2 on 47500] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1466792160799_175423_m_000000_0 is : 1.0
2016-06-28 13:14:01,368 INFO [IPC Server handler 14 on 47500] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1466792160799_175423_m_000000_0 is : 1.0
2016-06-28 13:14:31,461 INFO [IPC Server handler 28 on 47500] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1466792160799_175423_m_000000_0 is : 1.0
2016-06-28 13:15:01,561 INFO [IPC Server handler 2 on 47500] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1466792160799_175423_m_000000_0 is : 1.0

...........................

Highlighted

Re: YARN cluster stalls

New Contributor

Hi Haibochen,

 

I believe this to be more of a configuration issue. 

Pending containers are more than Allocated containers and previous jobs donot free up the resources.[When the stall occurred]

 

I'd apprecidate if you help us on this.

 

Thanks

Don't have an account?
Coming from Hortonworks? Activate your account here