Created 04-23-2018 11:34 AM
I have a Pig script running in MapReduce mode that's been receiving a persistent error which I've been unable to fix. The script spawns multiple applications; after running for several hours one of the application registers as SUCCEEDED but always returns the following diagnostic message:
We crashed after successfully committing. Recovering.
The step that causes the failure is trying to perform a RANK over a dataset that's around 100GB, split across roughly 1000 mapreduce output files from a previous script.
Digging into the logs, I find the following, which also seems to indicate that the job succeeded but then received an error winding down:
INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1523471594178_0475_m_001006_0 TaskAttempt Transitioned from COMMIT_PENDING to SUCCESS_CONTAINER_CLEANUP INFO [ContainerLauncher #6] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container container_e15_1523471594178_0475_01_001013 taskAttempt attempt_1523471594178_0475_m_001006_0 INFO [ContainerLauncher #6] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1523471594178_0475_m_001006_0 INFO [ContainerLauncher #6] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: Opening proxy : my.server.name:45454 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1523471594178_0475_m_001006_0 TaskAttempt Transitioned from SUCCESS_CONTAINER_CLEANUP to SUCCEEDED INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1523471594178_0475_m_001006_0 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1523471594178_0475_m_001006 Task Transitioned from RUNNING to SUCCEEDED INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 1011 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1523471594178_0475Job Transitioned from RUNNING to COMMITTING INFO [CommitterEvent Processor #1] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_COMMIT INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:2 AssignedReds:0 CompletedMaps:1011 CompletedReds:0 ContAlloc:1011 ContRel:0 HostLocal:1010 RackLocal:1 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_e15_1523471594178_0475_01_001014 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_e15_1523471594178_0475_01_001013 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:1011 CompletedReds:0 ContAlloc:1011 ContRel:0 HostLocal:1010 RackLocal:1 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1523471594178_0475_m_001007_0: Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143. INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1523471594178_0475_m_001006_0: Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143. FATAL [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counters: 121 max=120 at org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:101) at org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:108) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:78) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:95) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:106) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.incrAllCounters(AbstractCounterGroup.java:203) at org.apache.hadoop.mapreduce.counters.AbstractCounters.incrAllCounters(AbstractCounters.java:348) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.constructFinalFullcounters(JobImpl.java:1766) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.mayBeConstructFinalFullCounters(JobImpl.java:1752) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.createJobFinishedEvent(JobImpl.java:1733) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.logJobHistoryFinishedEvent(JobImpl.java:1092) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$CommitSucceededTransition.transition(JobImpl.java:2064) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$CommitSucceededTransition.transition(JobImpl.java:2060) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:999) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:139) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1385) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1381) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) at java.lang.Thread.run(Thread.java:745) INFO [AsyncDispatcher ShutDown handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.<br>
I've tried several methods of resolving the mapreduce.counters.LimitExceededException. I've modified the MapReduce configs to set mapreduce.job.counters.max to 20000 (just in an effort to test a resolution to this issue, not with the intent of leaving it there). I also tried starting my Pig script with the line set mapreduce.job.counters.max 10000; in an effort to override the max counters. Neither change appears to have any impact.
I'm confused why changing the max counters configuration doesn't seem to be having an impact. Is there some related configuration I could be missing? Or is this error message possibly inaccurate, or a symptom that signifies a different issue?
I'm at my wits' end trying to resolve this, any help would be appreciated!
Created 05-22-2018 07:47 PM
Can you check whether below classpath parameter contains hadoop conf (/etc/hadoop/conf) directory?
mapreduce.application.classpath
If not, append /etc/hadoop/conf in mapreduce.application.classpath parameter value and restart services.Try running job again.
Created 04-15-2020 05:56 PM
you really helped me! Ambari had mapreduce.application.classpath wrong and I never thought to check it.
Thank you!