Member since
03-08-2016
7
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5073 | 03-09-2016 07:08 PM |
11-08-2016
07:31 AM
Great to hear I somehow helped you 🙂 Glad it's not just me who has seen this. Thanks for posting the exact steps, it will be useful to refer back to this should I get the error again, and I've come across it a few times now.
... View more
03-09-2016
07:08 PM
So I was able to get past this error by running removing rows 9 and 10 from the table below. It appears that when two hosts I deleted came back , in effect totally new hosts but with the same hostname, it created a number of duplicate rows in the various topology tables. I deleted the duplicates from a number of these tables, but deleting the final two rows below fixed it for me...... I don't have a copy of how these looked, but some of them contained duplicate rows with the node names I had deleted and restored listed twice. Perhaps someone can shed some light on what may have caused this? Just to clarify, I have 7 hosts, so this table should contain 8 rows. 1 for the cluster, the remaining for the hosts. When things were failing it contained 10 rows. ambari=> select * from topology_request;
id | action | cluster_id | bp_name | cluster_properties | cluster_attributes | description
----+-----------+------------+------------+--------------------+--------------------+---------------------------------------
1 | PROVISION | 2 | testcluster | {} | {} | Provision Cluster 'testcluster'
2 | SCALE | 2 | testcluster | {} | {} | Scale Cluster 'testcluster' (+1 hosts)
3 | SCALE | 2 | testcluster | {} | {} | Scale Cluster 'testcluster' (+1 hosts)
4 | SCALE | 2 | testcluster | {} | {} | Scale Cluster 'testcluster' (+1 hosts)
5 | SCALE | 2 | testcluster | {} | {} | Scale Cluster 'testcluster' (+1 hosts)
6 | SCALE | 2 | testcluster | {} | {} | Scale Cluster 'testcluster' (+1 hosts)
7 | SCALE | 2 | testcluster | {} | {} | Scale Cluster 'testcluster' (+1 hosts)
8 | SCALE | 2 | testcluster | {} | {} | Scale Cluster 'testcluster' (+1 hosts)
9 | SCALE | 2 | testcluster | {} | {} | Scale Cluster 'testcluster' (+1 hosts)
10 | SCALE | 2 | testcluster | {} | {} | Scale Cluster 'testcluster' (+1 hosts)
(10 rows)
... View more
03-09-2016
12:05 PM
More info, digging a little deeper. Looks like it has scheduled a restart of everything. I might try and delete everything from these two tables to see if it will start correctly. ambari=> select request_context,encode(a.hosts,'escape') from requestresourcefilter a,request b where a.request_id = b.request_id AND a.request_id IN (select request_id from requestoperationlevel);
request_context | encode
-------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Restart all clients on ambdevtestdc2host-group-11.node.example | ambdevtestdc2host-group-11.node.example
Restart all clients on ambdevtestdc2host-group-11.node.example | ambdevtestdc2host-group-11.node.example
Restart all clients on ambdevtestdc2host-group-11.node.example | ambdevtestdc2host-group-11.node.example
Restart all clients on ambdevtestdc2host-group-11.node.example | ambdevtestdc2host-group-11.node.example
Restart all clients on ambdevtestdc2host-group-11.node.example | ambdevtestdc2host-group-11.node.example
Restart all clients on ambdevtestdc2host-group-11.node.example | ambdevtestdc2host-group-11.node.example
Restart all clients on ambdevtestdc2host-group-11.node.example | ambdevtestdc2host-group-11.node.example
Restart all clients on ambdevtestdc2host-group-11.node.example | ambdevtestdc2host-group-11.node.example
Restart all clients on ambdevtestdc2host-group-11.node.example | ambdevtestdc2host-group-11.node.example
Restart all clients on ambdevtestdc2host-group-11.node.example | ambdevtestdc2host-group-11.node.example
Restart all clients on ambdevtestdc2host-group-11.node.example | ambdevtestdc2host-group-11.node.example
Restart all clients on ambdevtestdc2host-group-11.node.example | ambdevtestdc2host-group-11.node.example
Restart all components for HDFS | ambdevtestdc2host-group-11.node.example,ambdevtestdc2host-group-21.node.example,ambdevtestdc2host-group-31.node.example,ambdevtestdc2host-group-41.node.dc2.consul,ambdevtestdc2host-group-51.node.example,ambdevtestdc2host-group-52.node.example,ambdevtestdc2host-group-53.node.example
Restart all components for HDFS | ambdevtestdc2host-group-11.node.example,ambdevtestdc2host-group-21.node.example,ambdevtestdc2host-group-31.node.example
Restart all components for HDFS | ambdevtestdc2host-group-11.node.example,ambdevtestdc2host-group-21.node.example
Restart all components for HDFS | ambdevtestdc2host-group-11.node.example,ambdevtestdc2host-group-21.node.example,ambdevtestdc2host-group-31.node.example,ambdevtestdc2host-group-41.node.dc2.consul,ambdevtestdc2host-group-51.node.example,ambdevtestdc2host-group-52.node.example,ambdevtestdc2host-group-53.node.example
Restart all components for HDFS | ambdevtestdc2host-group-11.node.example,ambdevtestdc2host-group-21.node.example
Restart all components for MAPREDUCE2 | ambdevtestdc2host-group-21.node.example
Restart all components for MAPREDUCE2 | ambdevtestdc2host-group-11.node.example,ambdevtestdc2host-group-21.node.example,ambdevtestdc2host-group-31.node.example,ambdevtestdc2host-group-41.node.dc2.consul,ambdevtestdc2host-group-51.node.example,ambdevtestdc2host-group-52.node.example,ambdevtestdc2host-group-53.node.example
Restart all components for HDFS | ambdevtestdc2host-group-11.node.example,ambdevtestdc2host-group-21.node.example,ambdevtestdc2host-group-31.node.example,ambdevtestdc2host-group-41.node.dc2.consul,ambdevtestdc2host-group-51.node.example,ambdevtestdc2host-group-52.node.example,ambdevtestdc2host-group-53.node.example
Restart all components for HDFS | ambdevtestdc2host-group-11.node.example,ambdevtestdc2host-group-21.node.example
Restart all components for HDFS | ambdevtestdc2host-group-11.node.example,ambdevtestdc2host-group-21.node.example,ambdevtestdc2host-group-31.node.example
Restart all components for HDFS | ambdevtestdc2host-group-11.node.example,ambdevtestdc2host-group-21.node.example
Restart all components for HDFS | ambdevtestdc2host-group-11.node.example,ambdevtestdc2host-group-21.node.example,ambdevtestdc2host-group-31.node.example,ambdevtestdc2host-group-41.node.dc2.consul,ambdevtestdc2host-group-51.node.example,ambdevtestdc2host-group-52.node.example,ambdevtestdc2host-group-53.node.example
(24 rows)
If you copy and paste the above into a text editor it will look a bit prettier 🙂
... View more
03-09-2016
10:04 AM
Few differences between the db's on the working and non working clusters. Broken: ambari=> select * from requestoperationlevel;
operation_level_id | request_id | level_name | cluster_name | service_name | host_component_name | host_id
--------------------+------------+------------+--------------+--------------+---------------------+---------
2 | 38 | Host | testcluster | | |
3 | 49 | Host | testcluster | | |
4 | 51 | Service | testcluster | HDFS | |
5 | 52 | Service | testcluster | MAPREDUCE2 | |
6 | 63 | Service | testcluster | HDFS | |
(5 rows)
Working: ambari=> select * from requestoperationlevel;
operation_level_id | request_id | level_name | cluster_name | service_name | host_component_name | host_id
--------------------+------------+------------+--------------+--------------+---------------------+---------
(0 rows)
Also in the broken cluster, this table has a bunch of rows in: requestresourcefilter But it's empty on the working db. Thanks!
... View more
03-09-2016
07:42 AM
Hi, many thanks for the help once more. As mentioned, this doesn't appear to be limited to node 51, there are errors in the logs for all the nodes. Here are the results of the queries: ambari=> SELECT * FROM topology_request;
id | action | cluster_id | bp_name | cluster_properties | cluster_attributes | description
----+-----------+------------+------------+--------------------+--------------------+---------------------------------------
1 | PROVISION | 2 | testcluster | {} | {} | Provision Cluster 'testcluster'
2 | SCALE | 2 | testcluster | {} | {} | Scale Cluster 'testcluster' (+1 hosts)
3 | SCALE | 2 | testcluster | {} | {} | Scale Cluster 'testcluster' (+1 hosts)
4 | SCALE | 2 | testcluster | {} | {} | Scale Cluster 'testcluster' (+1 hosts)
5 | SCALE | 2 | testcluster | {} | {} | Scale Cluster 'testcluster' (+1 hosts)
6 | SCALE | 2 | testcluster | {} | {} | Scale Cluster 'testcluster' (+1 hosts)
7 | SCALE | 2 | testcluster | {} | {} | Scale Cluster 'testcluster' (+1 hosts)
8 | SCALE | 2 | testcluster | {} | {} | Scale Cluster 'testcluster' (+1 hosts)
9 | SCALE | 2 | testcluster | {} | {} | Scale Cluster 'testcluster' (+1 hosts)
10 | SCALE | 2 | testcluster | {} | {} | Scale Cluster 'testcluster' (+1 hosts)
(10 rows)
ambari=> SELECT * FROM topology_logical_request;
id | request_id | description
----+------------+--------------------------------------------------------
1 | 1 | Logical Request: Provision Cluster 'testcluster'
4 | 2 | Logical Request: Scale Cluster 'testcluster' (+1 hosts)
5 | 3 | Logical Request: Scale Cluster 'testcluster' (+1 hosts)
6 | 4 | Logical Request: Scale Cluster 'testcluster' (+1 hosts)
7 | 5 | Logical Request: Scale Cluster 'testcluster' (+1 hosts)
8 | 6 | Logical Request: Scale Cluster 'testcluster' (+1 hosts)
19 | 7 | Logical Request: Scale Cluster 'testcluster' (+1 hosts)
20 | 8 | Logical Request: Scale Cluster 'testcluster' (+1 hosts)
(8 rows)
ambari=> SELECT * FROM topology_logical_task, host_role_command WHERE topology_logical_task.physical_task_id = host_role_command.task_id AND host_role_command.status != 'COMPLETED';
id | host_task_id | physical_task_id | component | task_id | attempt_count | retry_allowed | event | exitcode | host_id | last_attempt_time | request_id | role | stage_id | start_time | end_time | status | auto_skip_on_failure | std_err
or | std_out | output_log | error_log | structured_out | role_command | command_detail | custom_command_name
----+--------------+------------------+-----------+---------+---------------+---------------+-------+----------+---------+-------------------+------------+------+----------+------------+----------+--------+----------------------+--------
---+---------+------------+-----------+----------------+--------------+----------------+---------------------
(0 rows)
I checked and these tables look pretty much exactly the same as a second cluster we have, which is working perfectly fine. To test, I stopped the management server in the working cluster, and restarted all the agents. All still seems fine.... Thanks!
... View more
03-08-2016
03:28 PM
Hi Jonathan, Many thanks for getting back to me. I am using postgres, just with the default install which comes with the ambari-server setup. Just some more information. I am using a blueprint to setup the cluster. I have also destroyed a number of servers in the cluster and re-created them, for testing purposes, to make sure we could recover from node failure. Reinstalling the components using the DELETE/POST method via the API. This all seemed to work fine, everything was nice and green prior to running through the upgrade. I'm happy to run commands on the database to pull back any info you need if it can help diagnose the state the db is in. Thanks!
... View more
03-08-2016
02:37 PM
1 Kudo
Hi All,
I've upgraded Ambari from version 2.1.2-377 to version 2.2.1.0-161.
After performing the upgrade on the server, agents, upgrading the database and starting everything up, I keep seeing the following error in the logs on the server: 08 Mar 2016 10:07:05,087 INFO [qtp-ambari-agent-55] HostRequest:125 - HostRequest: Successfully recovered host request for host: Host Assignment Pending
08 Mar 2016 10:07:05,088 INFO [qtp-ambari-agent-55] LogicalRequest:420 - LogicalRequest.createHostRequests: created new outstanding host request ID = 3
08 Mar 2016 10:07:05,120 INFO [qtp-ambari-agent-55] HostRequest:125 - HostRequest: Successfully recovered host request for host: Host Assignment Pending
08 Mar 2016 10:07:05,120 INFO [qtp-ambari-agent-55] LogicalRequest:420 - LogicalRequest.createHostRequests: created new outstanding host request ID = 5
08 Mar 2016 10:07:05,134 INFO [qtp-ambari-agent-55] HostRequest:125 - HostRequest: Successfully recovered host request for host: Host Assignment Pending
08 Mar 2016 10:07:05,134 INFO [qtp-ambari-agent-55] LogicalRequest:420 - LogicalRequest.createHostRequests: created new outstanding host request ID = 8
08 Mar 2016 10:07:05,147 INFO [qtp-ambari-agent-55] HostRequest:125 - HostRequest: Successfully recovered host request for host: Host Assignment Pending
08 Mar 2016 10:07:05,148 INFO [qtp-ambari-agent-55] LogicalRequest:420 - LogicalRequest.createHostRequests: created new outstanding host request ID = 7
08 Mar 2016 10:07:05,158 INFO [qtp-ambari-agent-55] HostRequest:125 - HostRequest: Successfully recovered host request for host: Host Assignment Pending
08 Mar 2016 10:07:05,158 INFO [qtp-ambari-agent-55] LogicalRequest:420 - LogicalRequest.createHostRequests: created new outstanding host request ID = 6
08 Mar 2016 10:07:05,170 INFO [qtp-ambari-agent-55] HostRequest:125 - HostRequest: Successfully recovered host request for host: Host Assignment Pending
08 Mar 2016 10:07:05,170 INFO [qtp-ambari-agent-55] LogicalRequest:420 - LogicalRequest.createHostRequests: created new outstanding host request ID = 2
08 Mar 2016 10:07:05,184 INFO [qtp-ambari-agent-55] HostRequest:125 - HostRequest: Successfully recovered host request for host: Host Assignment Pending
08 Mar 2016 10:07:05,185 INFO [qtp-ambari-agent-55] LogicalRequest:420 - LogicalRequest.createHostRequests: created new outstanding host request ID = 1
08 Mar 2016 10:07:05,194 INFO [qtp-ambari-agent-55] HostRequest:125 - HostRequest: Successfully recovered host request for host: Host Assignment Pending
08 Mar 2016 10:07:05,194 INFO [qtp-ambari-agent-55] LogicalRequest:420 - LogicalRequest.createHostRequests: created new outstanding host request ID = 4
08 Mar 2016 10:07:05,290 INFO [qtp-ambari-agent-55] HostRequest:125 - HostRequest: Successfully recovered host request for host: ambdevtestdc2host-group-21.node.example
08 Mar 2016 10:07:05,328 INFO [qtp-ambari-agent-55] HostRequest:125 - HostRequest: Successfully recovered host request for host: ambdevtestdc2host-group-51.node.example
08 Mar 2016 10:07:05,384 INFO [qtp-ambari-agent-55] HostRequest:125 - HostRequest: Successfully recovered host request for host: ambdevtestdc2host-group-11.node.example
08 Mar 2016 10:07:05,428 INFO [qtp-ambari-agent-55] HostRequest:125 - HostRequest: Successfully recovered host request for host: ambdevtestdc2host-group-41.node.example
08 Mar 2016 10:07:05,507 INFO [qtp-ambari-agent-55] HostRequest:125 - HostRequest: Successfully recovered host request for host: ambdevtestdc2host-group-31.node.example
08 Mar 2016 10:07:05,575 INFO [qtp-ambari-agent-55] HostRequest:125 - HostRequest: Successfully recovered host request for host: ambdevtestdc2host-group-53.node.example
08 Mar 2016 10:07:05,627 INFO [qtp-ambari-agent-55] HostRequest:125 - HostRequest: Successfully recovered host request for host: ambdevtestdc2host-group-52.node.example
08 Mar 2016 10:07:05,644 WARN [qtp-ambari-agent-55] ServletHandler:563 - /agent/v1/register/ambdevtestdc2host-group-51.node.example
java.lang.NullPointerException
at org.apache.ambari.server.topology.PersistedStateImpl.getAllRequests(PersistedStateImpl.java:157)
at org.apache.ambari.server.topology.TopologyManager.ensureInitialized(TopologyManager.java:131)
at org.apache.ambari.server.topology.TopologyManager.onHostRegistered(TopologyManager.java:315)
at org.apache.ambari.server.state.host.HostImpl$HostRegistrationReceived.transition(HostImpl.java:301)
at org.apache.ambari.server.state.host.HostImpl$HostRegistrationReceived.transition(HostImpl.java:266)
at org.apache.ambari.server.state.fsm.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:354)
at org.apache.ambari.server.state.fsm.StateMachineFactory.doTransition(StateMachineFactory.java:294)
at org.apache.ambari.server.state.fsm.StateMachineFactory.access$300(StateMachineFactory.java:39)
at org.apache.ambari.server.state.fsm.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:440)
at org.apache.ambari.server.state.host.HostImpl.handleEvent(HostImpl.java:570)
at org.apache.ambari.server.agent.HeartBeatHandler.handleRegistration(HeartBeatHandler.java:966)
at org.apache.ambari.server.agent.rest.AgentResource.register(AgentResource.java:95)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409)
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:540)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:715)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1496)
at org.apache.ambari.server.security.SecurityFilter.doFilter(SecurityFilter.java:67)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467)
at org.apache.ambari.server.api.AmbariPersistFilter.doFilter(AmbariPersistFilter.java:47)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467)
at org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:82)
at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:294)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:429)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:370)
at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:982)
at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1043)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:865)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
at org.eclipse.jetty.io.nio.SslConnection.handle(SslConnection.java:196)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:696)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:53)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)
This is not specific to host group ambdevtestdc2host-group-51.node.example, it is happening for all host groups.
On the agents I see the following: <head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 500 Server Error</title>
</head>
<body>
<h2>HTTP ERROR: 500</h2>
<p>Problem accessing /agent/v1/register/ambdevtestdc2host-group-51.node.example Reason:
<pre> Server Error</pre></p>
<hr /><i><small>Powered by Jetty://</small></i>
Is there a work around for this? It's just a test cluster, but it would be good to know how to work around this, as I've seen it a number of times now. Is there anything that can be modified in the database to resolve it?
Thanks!
... View more
Labels:
- Labels:
-
Apache Ambari