Support Questions

Find answers, ask questions, and share your expertise

Got 500 Error when trying to view status of a running mapreduce job

Contributor

We just launched our cluster with CDH5.3.3 on AWS, and We are currently testing our process on the new cluster.

Our job ran fine, but we ran into issues when we tried to view the actual status detail page of a running job.

We got HTTP 500 error when we either click on the 'Application Master' link on the Resource Manager UI page or by click on the link under 'Child Job Urls' tab when the job is running. We were able to get the job status page after it completed, but not while it was still running. 

Is anyone know how to fix this issue? This is a real issue for us.

The detail error is listed below.

Thanks

 

HTTP ERROR 500

Problem accessing /. Reason:

    Guice configuration errors:

1) Could not find a suitable constructor in com.sun.jersey.guice.spi.container.servlet.GuiceContainer. Classes must have either one (and only one) constructor annotated with @Inject or a zero-argument constructor that is not private.
  at com.sun.jersey.guice.spi.container.servlet.GuiceContainer.class(GuiceContainer.java:108)
  while locating com.sun.jersey.guice.spi.container.servlet.GuiceContainer

1 error

 

Caused by:

com.google.inject.ConfigurationException: Guice configuration errors:

1) Could not find a suitable constructor in com.sun.jersey.guice.spi.container.servlet.GuiceContainer. Classes must have either one (and only one) constructor annotated with @Inject or a zero-argument constructor that is not private.
  at com.sun.jersey.guice.spi.container.servlet.GuiceContainer.class(GuiceContainer.java:108)
  while locating com.sun.jersey.guice.spi.container.servlet.GuiceContainer

1 error
	at com.google.inject.InjectorImpl.getBinding(InjectorImpl.java:113)
	at com.google.inject.InjectorImpl.getBinding(InjectorImpl.java:63)
	at com.google.inject.servlet.FilterDefinition.init(FilterDefinition.java:99)
	at com.google.inject.servlet.ManagedFilterPipeline.initPipeline(ManagedFilterPipeline.java:98)
	at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:114)
	at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
	at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1224)
	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
	at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
	at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767)
	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
	at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
	at org.mortbay.jetty.Server.handle(Server.java:326)
	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
	at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
	at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
	at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

 

23 REPLIES 23

Super Collaborator

That symptoms sounds like YARN-3351 but the stack trace is not the same and that should have been fixed in the release you have also.

 

Can you give a bit more detail on where you saw the issue? Is this logged in the RM, do youhave HA for the RM etc. I can not really tell from this short snippet what is happening but this is not a known issue.

 

Wilfred

Contributor

Hi Wilfred,

Yes, it is logged in the RM, and I do have HA setup for RM. I do have HA setup for HDFS as well.

The job did show on the RM.

Below is a screen shot of RM UI page which listed all running job. When I clicked on the ApplicationMaster link for each job under Tracking UI column, the errors page was showned instead of the actual status page. 

Screen Shot 2015-05-13 at 8.00.56 PM.png

 

I also got the same errors if I click on search icon for Child Job 1: under the Child JOb Urls on Oozie UI. 

Here is a screen shot of the oozie page.

Screen Shot 2015-05-13 at 8.04.58 PM.png

 

Here is the screen shoot for the error page:

 

Screen Shot 2015-05-13 at 8.13.00 PM.png

 

I did could get to the status page find after the job completed, but not while it is still running.

Thank you very for your help.

Super Collaborator

Please provide the log from the RM, showing the error and information before and after it.

This works for me without an issue in all my test clusters I have. If there is private information in it send it through a private message please.

 

Wilfred

Contributor

There were not much error from the resource manager log.

Whenever I click on the child url, there there message like this in the RM log. I ran the job with non oozie user, but it still happened if I ran it with oozie user.

The link try to access: http://i-802bd856.prod-dis11.aws1:8088/proxy/application_1431873888015_3748

Here is the partial log.

2015-05-18 08:41:37,237 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing unchecked http://i-922ad944.prod-dis11.aws1:33802/ws/v1/mapreduce/jobs/job_1431873888015_3748 which is the app master GUI of application_1431873888015_3748 owned by inventory

2015-05-18 08:41:37,237 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing unchecked http://i-9f2ad949.prod-dis11.aws1:47581/ws/v1/mapreduce/jobs/job_1431873888015_3758 which is the app master GUI of application_1431873888015_3758 owned by oozie

2015-05-18 08:41:43,844 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Updating application attempt appattempt_1431873888015_3758_000001 with final state: FINISHING, and exit status: -1000

2015-05-18 08:41:43,846 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing unchecked http://i-922ad944.prod-dis11.aws1:33802/ which is the app master GUI of application_1431873888015_3748 owned by inventory

2015-05-18 08:41:43,846 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1431873888015_3758_000001 State change from RUNNING to FINAL_SAVING

2015-05-18 08:41:43,846 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating application application_1431873888015_3758 with final state: FINISHING

2015-05-18 08:41:43,848 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Watcher event type: NodeDataChanged with state:SyncConnected for path:/rmstore/ZKRMStateRoot/RMAppRoot/application_1431873888015_3758/appattempt_1431873888015_3758_000001 for Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED

 

I found some other people who had kind of similar message from log about (dr. who) and they were able to resolve by playing with hadoop.http.staticuser.user property or disable ACL. I tried to disabled by setting yarn.acl.enable = false, but I did  not help.

 

Contributor

Hi Wilfred,

Have you had a chance to take a look with my update?

Thanks

 

 

Explorer

Hi, ttruong, did you found how to fix this error?

 

Super Collaborator

Sorry, this slipt through the cracks. If you have already turned of the ACL then you should be able to get the logs via the command line.

Run yarn logs -applicationId <APPLICATION ID>

That should return the full log and also follow the normal process through all the proxies and checks to get the files and we should be able to hopefully tell what is going on in more detail.

 

Wilfred

Contributor

Hi Wilfred,

I really don't want to disable the ACL. I just tried to disable it to see it help to resolve the issue. I prefer to get the application log

using the UI instead of login to the host to retrieve the logs because not everyone has access to the host.

By the way, I just tried to retrive the log based on your suggestion by issue the command below, but it still did not work.

yarn logs -applicationId application_1432831904896_0149

I got this message:

15/05/28 14:51:17 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm54

Application has not completed. Logs are only available after an application completes

 

 

Explorer

hello ttruong,

 

I had the same issue with you one month ago, but it was fixed by some operation(I did so many operations, I did not know which one fixed it).

And for the sevral weeks it worked fine. But yesterday, it didn't work any more, but I didn't do anything...

I guess it is something wrong with which permission of the log directory...

 

keep in touch

 

Linou

Super Collaborator

There is an existing issue in releases before CDH 5.3.3 which could cause the issue to show. That issue was introduced to fix a similar issue in an earlier release. Both issues were intermittent and related to HA.

Unless you are on CDH 5.3.3 or later you could be seeing one of those.

 

Wilfred

Contributor

Hi Linou,

Were you able to resolve your issues again? I still could not solve it yet.

Thanks

Explorer

hi, ttruong

 

Just resolved this problem.

when I run MR job on my datanode(hadoop server), I can see the page, but when I run on client side, it cannot.

So I think there should be something diff between server side and client side.

I found my client config file missed "yarn.resourcemanager.webapp.address" property.

When I added it in yarn-site.xml on client side:

<property>
  <name>yarn.resourcemanager.webapp.address</name>
  <value>$HOSTNAME:8088</value>
</property>

 

It works fine!!

Hope can help you.

good luck!

 

Linou

 

Super Collaborator

Good to hear that this has been fixed!

 

We have seen this issue in early CDH 5 releases but this was fixed in CMC/CDH 5.2 and later. Cloudera Manager should have deployed that configuration setting for you in the client config on all nodes. If you did not use CM then that could explain it, otherwise I am would not know how that could have happened.

 

 

Wilfred

Contributor

Thank you very much for your response.

I checked both resource manager node and the value for yarn.resourcemanager.webapp.address property was set.

I still could not get it working.

 

Super Collaborator

Please make sure that you also have added the setting to the configuration on the client node. The setting should be applied to all nodes in the cluster not just the nodes that run the service.

 

Wilfred

Super Collaborator

If you are not running the yarn command as the owner of the application you might need to add:

-appOwner <username>

To the yarn logs command line. If you do not have access the error you showed could be thrown.

We do not distinguish between not getting access and not finishing the aggregation.

 

Wilfred

Contributor

Hi Wilfred,

We are currently on 5.3.3 now, and we are still having that issues now.

Also, I did try yarn command with -appOwner option but it still returned the same message. I ran the command as yarn user.

Thanks 

Explorer

Hello,

I am a team member of ttruong and I have been taking a look at this issue.  I tracked the error to the container logs on the server executing the mapreduce action.  It appears like this is an error with the following page: /ws/v1/mapreduce/jobs/job_1436912235624_4832.  We have updated the permissions on all of our log directories and that has resolved all of the log issues except this one.

 

Here is an excerpt from the log file at: /var/log/hadoop-yarn/container/container_e90_1436912235624_4832_01_000001/stderr

 

[IPC Server handler 26 on 47733] INFO org.apache.hadoop.mapred.TaskAttemptListenerImpl - Progress of TaskAttempt attempt_1436912235624_4832_m_000000_0 is : 0.4483067

[IPC Server handler 29 on 47733] INFO org.apache.hadoop.mapred.TaskAttemptListenerImpl - Progress of TaskAttempt attempt_1436912235624_4832_m_000000_0 is : 0.44861746

[IPC Server handler 27 on 47733] INFO org.apache.hadoop.mapred.TaskAttemptListenerImpl - Progress of TaskAttempt attempt_1436912235624_4832_m_000000_0 is : 0.44898608

[IPC Server handler 0 on 47733] INFO org.apache.hadoop.mapred.TaskAttemptListenerImpl - Progress of TaskAttempt attempt_1436912235624_4832_m_000000_0 is : 0.44930127

[IPC Server handler 1 on 47733] INFO org.apache.hadoop.mapred.TaskAttemptListenerImpl - Progress of TaskAttempt attempt_1436912235624_4832_m_000000_0 is : 0.4495801

Jul 15, 2015 2:19:51 PM com.google.inject.servlet.InternalServletModule$BackwardsCompatibleServletContextProvider get

WARNING: You are attempting to use a deprecated API (specifically, attempting to @Inject ServletContext inside an eagerly created singleton. While we allow this for backwards compatibility, be warned that this MAY have unexpected behavior if you have more than one injector (with ServletModule) running in the same JVM. Please consult the Guice documentation at http://code.google.com/p/google-guice/wiki/Servlets for more information.

[1347697750@qtp-453785195-2] ERROR org.mortbay.log - /ws/v1/mapreduce/jobs/job_1436912235624_4832

com.google.inject.ConfigurationException: Guice configuration errors:

 

1) Could not find a suitable constructor in com.sun.jersey.guice.spi.container.servlet.GuiceContainer. Classes must have either one (and only one) constructor annotated with @Inject or a zero-argument constructor that is not private.

  at com.sun.jersey.guice.spi.container.servlet.GuiceContainer.class(GuiceContainer.java:108)

  while locating com.sun.jersey.guice.spi.container.servlet.GuiceContainer

 

1 error

        at com.google.inject.InjectorImpl.getBinding(InjectorImpl.java:113)

        at com.google.inject.InjectorImpl.getBinding(InjectorImpl.java:63)

        at com.google.inject.servlet.FilterDefinition.init(FilterDefinition.java:99)

        at com.google.inject.servlet.ManagedFilterPipeline.initPipeline(ManagedFilterPipeline.java:98)

        at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:114)

        at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)

        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)

        at org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:164)

        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)

        at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1224)

        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)

        at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)

        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)

        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)

        at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)

        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)

        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767)

        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)

        at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)

        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)

        at org.mortbay.jetty.Server.handle(Server.java:326)

        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)

        at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)

        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)

        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)

        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)

        at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)

        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

 

Any help debugging this would be appreciated!

Thanks.

Explorer

I also checked the yarn-site.xml in /etc/hadoop/conf on both the active resource manager node and the node manager running the container and both files had the following configuration properties set.  The server names are correct for both resource manager nodes running in HA.

 

  <property>

    <name>yarn.resourcemanager.webapp.address.rm21</name>

    <value>i-802bd856.prod-dis11.aws1:8088</value>

  </property>

 

  <property>

    <name>yarn.resourcemanager.webapp.address.rm54</name>

    <value>i-942ad942.prod-dis11.aws1:8088</value>

  </property>

Super Collaborator

A container log is not part of the yarn service logs and will not be affected by any of the yarn settings. The container log looks like a log from an AM and that means that you most likely are looking at a problem of the AM web UI not being able to bind.

The AM web ui will bind to an ephemeral port which can not be limited to a set of ports. Make sure that you allow binding to any port on the NM's from your security groups in AWS.

 

Wilfred

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.