About jeff_rosenberg

jeff_rosenberg · ‎04-25-2018

@Kuldeep Kulkarni if you post the tez-site solution as a separate answer, I'll mark it as accepted so folks can easily find it. 🙂

jeff_rosenberg · ‎04-25-2018

Passing tez-site.xml to the Pig action resolved my issue. I was able to do that by adding a <file> element to my workflow: <file>/path/to/tez-site.xml</file>

jeff_rosenberg · ‎04-25-2018

No, I didn't try that. I did pass in the tez.tez-ui.history-url.base property, but not the whole tez-site.xml. How do I manage that from within the Oozie Pig Action?

jeff_rosenberg · ‎04-25-2018

Thanks, but I'm already doing this. I can confirm that my script runs in Tez mode, but it doesn't appear in the Tez UI.

jeff_rosenberg · ‎04-23-2018

When I run Pig scripts in Tez execution mode, they appear in the Tez UI as expected. But when I call those same scripts from a Pig action, they don't appear in the Tez UI. I can see them running from the Resource Manager screens, but clicking on the "Tracking UI" link displays the following message: Tez UI Url is not defined. To enable tracking url pointing to Tez UI, set the config tez.tez-ui.history-url.base in the tez-site.xml. I have defined tez.tez-ui.history-url.base in the tez-site config in Ambari, so I'm guessing there's some sort of Oozie configuration needed? I tried adding a tez.tez-ui.history-url.base property to the global configuration of my Oozie workflows, set to the same value as found in the tez-site config, but that didn't resolve the issue.

jeff_rosenberg · ‎04-23-2018

I have a Pig script running in MapReduce mode that's been receiving a persistent error which I've been unable to fix. The script spawns multiple applications; after running for several hours one of the application registers as SUCCEEDED but always returns the following diagnostic message: We crashed after successfully committing. Recovering. The step that causes the failure is trying to perform a RANK over a dataset that's around 100GB, split across roughly 1000 mapreduce output files from a previous script. Digging into the logs, I find the following, which also seems to indicate that the job succeeded but then received an error winding down: INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1523471594178_0475_m_001006_0 TaskAttempt Transitioned from COMMIT_PENDING to SUCCESS_CONTAINER_CLEANUP INFO [ContainerLauncher #6] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container container_e15_1523471594178_0475_01_001013 taskAttempt attempt_1523471594178_0475_m_001006_0 INFO [ContainerLauncher #6] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1523471594178_0475_m_001006_0 INFO [ContainerLauncher #6] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: Opening proxy : my.server.name:45454 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1523471594178_0475_m_001006_0 TaskAttempt Transitioned from SUCCESS_CONTAINER_CLEANUP to SUCCEEDED INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1523471594178_0475_m_001006_0 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1523471594178_0475_m_001006 Task Transitioned from RUNNING to SUCCEEDED INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 1011 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1523471594178_0475Job Transitioned from RUNNING to COMMITTING INFO [CommitterEvent Processor #1] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_COMMIT INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:2 AssignedReds:0 CompletedMaps:1011 CompletedReds:0 ContAlloc:1011 ContRel:0 HostLocal:1010 RackLocal:1 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_e15_1523471594178_0475_01_001014 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_e15_1523471594178_0475_01_001013 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:1011 CompletedReds:0 ContAlloc:1011 ContRel:0 HostLocal:1010 RackLocal:1 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1523471594178_0475_m_001007_0: Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143. INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1523471594178_0475_m_001006_0: Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143. FATAL [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counters: 121 max=120 at org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:101) at org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:108) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:78) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:95) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:106) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.incrAllCounters(AbstractCounterGroup.java:203) at org.apache.hadoop.mapreduce.counters.AbstractCounters.incrAllCounters(AbstractCounters.java:348) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.constructFinalFullcounters(JobImpl.java:1766) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.mayBeConstructFinalFullCounters(JobImpl.java:1752) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.createJobFinishedEvent(JobImpl.java:1733) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.logJobHistoryFinishedEvent(JobImpl.java:1092) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$CommitSucceededTransition.transition(JobImpl.java:2064) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$CommitSucceededTransition.transition(JobImpl.java:2060) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:999) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:139) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1385) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1381) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) at java.lang.Thread.run(Thread.java:745) INFO [AsyncDispatcher ShutDown handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.<br> I've tried several methods of resolving the mapreduce.counters.LimitExceededException. I've modified the MapReduce configs to set mapreduce.job.counters.max to 20000 (just in an effort to test a resolution to this issue, not with the intent of leaving it there). I also tried starting my Pig script with the line set mapreduce.job.counters.max 10000; in an effort to override the max counters. Neither change appears to have any impact. I'm confused why changing the max counters configuration doesn't seem to be having an impact. Is there some related configuration I could be missing? Or is this error message possibly inaccurate, or a symptom that signifies a different issue? I'm at my wits' end trying to resolve this, any help would be appreciated!

jeff_rosenberg · ‎03-04-2018

This seems to have resolved the problem, thanks again!

jeff_rosenberg · ‎03-04-2018

Thanks so much, Pierre. The jobs that stop making progress are the jobs kicked off by the oozie-launcher jobs, which then in turn causes the oozie-launcher jobs to hang as well. I had been wondering if those oozie-launcher jobs were the issue; I'll try creating a separate queue for them and report back.

jeff_rosenberg · ‎03-01-2018

I have an Oozie workflow with multiple steps that is used for staging data in HDFS. The workflow is called multiple times -- once for each file I want to stage -- often in quick succession. When that happens, the designated Yarn queue for the workflow reaches capacity and new instantiations of the workflow go to ACCEPTED but not RUNNING status. That makes sense, but once the queue is at capacity, RUNNING jobs stop making progress and are unable to move to the next step. It seems like Yarn won't let go of each step's resources until it moves to the next step, but there aren't enough resources available to allocate to the next step, resulting in a deadlock. I've tried a number of different configurations, but while there are a lot of options to distribute the workload across multiple queues, I haven't come across any settings that help me to manage deadlocks within a particular queue. What can I do, either from a Yarn configuration standpoint or an application design standpoint, to avoid these sorts of deadlocks? I'm hoping to avoid modifying the code that kicks off these processes to make it monitor cluster resources; is there some way to make this work so I can just throw all of my executions onto the Yarn queue, but have them process successfully in a FIFO manner? I've attached a snapshot of my current queue settings. One other point to note is that in my Oozie workflows, the entire job is allocated to the "staging" queue, it doesn't vary by action. Is that possibly a problem?

Online	Offline
Last Visited	‎04-25-2018 04:58 PM

Member Since	‎03-01-2018 02:23 PM
Last Visited	‎04-25-2018 04:58 PM
Posts	9

Cloudera Community

Re: Oozie Pig action doesn't appear in Tez UI

Re: Oozie Pig action doesn't appear in Tez UI

Re: Oozie Pig action doesn't appear in Tez UI

Re: Oozie Pig action doesn't appear in Tez UI

Oozie Pig action doesn't appear in Tez UI

Receiving mapreduce.counters.LimitExceededExceptio...

Re: Once YARN queue is at capacity, running jobs s...

Re: Once YARN queue is at capacity, running jobs s...

Once YARN queue is at capacity, running jobs stop ...