Member since
03-01-2018
9
Posts
0
Kudos Received
0
Solutions
04-25-2018
04:07 PM
@Kuldeep Kulkarni if you post the tez-site solution as a separate answer, I'll mark it as accepted so folks can easily find it. 🙂
... View more
04-25-2018
03:42 PM
Passing tez-site.xml to the Pig action resolved my issue. I was able to do that by adding a <file> element to my workflow: <file>/path/to/tez-site.xml</file>
... View more
04-25-2018
01:59 AM
No, I didn't try that. I did pass in the tez.tez-ui.history-url.base property, but not the whole tez-site.xml. How do I manage that from within the Oozie Pig Action?
... View more
04-25-2018
01:50 AM
Thanks, but I'm already doing this. I can confirm that my script runs in Tez mode, but it doesn't appear in the Tez UI.
... View more
04-23-2018
12:08 PM
When I run Pig scripts in Tez execution mode, they appear in the Tez UI as expected. But when I call those same scripts from a Pig action, they don't appear in the Tez UI. I can see them running from the Resource Manager screens, but clicking on the "Tracking UI" link displays the following message: Tez UI Url is not defined.
To enable tracking url pointing to Tez UI, set the config tez.tez-ui.history-url.base in the tez-site.xml. I have defined tez.tez-ui.history-url.base in the tez-site config in Ambari, so I'm guessing there's some sort of Oozie configuration needed? I tried adding a tez.tez-ui.history-url.base property to the global configuration of my Oozie workflows, set to the same value as found in the tez-site config, but that didn't resolve the issue.
... View more
Labels:
- Labels:
-
Apache Oozie
-
Apache Pig
-
Apache Tez
04-23-2018
11:34 AM
I have a Pig script running in MapReduce mode that's been receiving a persistent error which I've been unable to fix. The script spawns multiple applications; after running for several hours one of the application registers as SUCCEEDED but always returns the following diagnostic message: We crashed after successfully committing. Recovering. The step that causes the failure is trying to perform a RANK over a dataset that's around 100GB, split across roughly 1000 mapreduce output files from a previous script. Digging into the logs, I find the following, which also seems to indicate that the job succeeded but then received an error winding down: INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1523471594178_0475_m_001006_0 TaskAttempt Transitioned from COMMIT_PENDING to SUCCESS_CONTAINER_CLEANUP
INFO [ContainerLauncher #6] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container container_e15_1523471594178_0475_01_001013 taskAttempt attempt_1523471594178_0475_m_001006_0
INFO [ContainerLauncher #6] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1523471594178_0475_m_001006_0
INFO [ContainerLauncher #6] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: Opening proxy : my.server.name:45454
INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1523471594178_0475_m_001006_0 TaskAttempt Transitioned from SUCCESS_CONTAINER_CLEANUP to SUCCEEDED
INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1523471594178_0475_m_001006_0
INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1523471594178_0475_m_001006 Task Transitioned from RUNNING to SUCCEEDED
INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 1011
INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1523471594178_0475Job Transitioned from RUNNING to COMMITTING
INFO [CommitterEvent Processor #1] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_COMMIT
INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:2 AssignedReds:0 CompletedMaps:1011 CompletedReds:0 ContAlloc:1011 ContRel:0 HostLocal:1010 RackLocal:1
INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_e15_1523471594178_0475_01_001014
INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_e15_1523471594178_0475_01_001013
INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:1011 CompletedReds:0 ContAlloc:1011 ContRel:0 HostLocal:1010 RackLocal:1
INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1523471594178_0475_m_001007_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143.
INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1523471594178_0475_m_001006_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143.
FATAL [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counters: 121 max=120
at org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:101)
at org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:108)
at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:78)
at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:95)
at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:106)
at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.incrAllCounters(AbstractCounterGroup.java:203)
at org.apache.hadoop.mapreduce.counters.AbstractCounters.incrAllCounters(AbstractCounters.java:348)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.constructFinalFullcounters(JobImpl.java:1766)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.mayBeConstructFinalFullCounters(JobImpl.java:1752)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.createJobFinishedEvent(JobImpl.java:1733)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.logJobHistoryFinishedEvent(JobImpl.java:1092)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$CommitSucceededTransition.transition(JobImpl.java:2064)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$CommitSucceededTransition.transition(JobImpl.java:2060)
at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:999)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:139)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1385)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1381)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
at java.lang.Thread.run(Thread.java:745)
INFO [AsyncDispatcher ShutDown handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.<br> I've tried several methods of resolving the mapreduce.counters.LimitExceededException. I've modified the MapReduce configs to set mapreduce.job.counters.max to 20000 (just in an effort to test a resolution to this issue, not with the intent of leaving it there). I also tried starting my Pig script with the line set mapreduce.job.counters.max 10000; in an effort to override the max counters. Neither change appears to have any impact. I'm confused why changing the max counters configuration doesn't seem to be having an impact. Is there some related configuration I could be missing? Or is this error message possibly inaccurate, or a symptom that signifies a different issue? I'm at my wits' end trying to resolve this, any help would be appreciated!
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Pig
-
Apache YARN
03-04-2018
04:34 AM
This seems to have resolved the problem, thanks again!
... View more
03-04-2018
01:49 AM
Thanks so much, Pierre. The jobs that stop making progress are the jobs kicked off by the oozie-launcher jobs, which then in turn causes the oozie-launcher jobs to hang as well. I had been wondering if those oozie-launcher jobs were the issue; I'll try creating a separate queue for them and report back.
... View more
03-01-2018
02:24 PM
I have an Oozie workflow with multiple steps that is used for staging data in HDFS. The workflow is called multiple times -- once for each file I want to stage -- often in quick succession. When that happens, the designated Yarn queue for the workflow reaches capacity and new instantiations of the workflow go to ACCEPTED but not RUNNING status. That makes sense, but once the queue is at capacity, RUNNING jobs stop making progress and are unable to move to the next step. It seems like Yarn won't let go of each step's resources until it moves to the next step, but there aren't enough resources available to allocate to the next step, resulting in a deadlock. I've tried a number of different configurations, but while there are a lot of options to distribute the workload across multiple queues, I haven't come across any settings that help me to manage deadlocks within a particular queue. What can I do, either from a Yarn configuration standpoint or an application design standpoint, to avoid these sorts of deadlocks? I'm hoping to avoid modifying the code that kicks off these processes to make it monitor cluster resources; is there some way to make this work so I can just throw all of my executions onto the Yarn queue, but have them process successfully in a FIFO manner? I've attached a snapshot of my current queue settings. One other point to note is that in my Oozie workflows, the entire job is allocated to the "staging" queue, it doesn't vary by action. Is that possibly a problem?
... View more
Labels:
- Labels:
-
Apache Oozie
-
Apache YARN
-
Cloudera Manager