Created 01-18-2017 03:09 PM
I'm running ambari 2.1.0. I tried to move the App Timeline Server (ATS) and the process failed (for various reasons).
I was able to bring ambari back up, but now it seems to think it has two ATS masters. In order to get things operational, I've put one of them (the one I was trying to move the service to) in maintenance mode and started the original one. Now I have a permanent alert on the first host that it failed to connect to the ATS on the second host.
From the dashboard, YARN appears to be up/operational but I'm not sure if it is or not.
Any suggestions on how I might be able to untangle this?
Created 01-20-2017 06:15 PM
Thanks for sharing the output. Yes, that's exactly what I meant (REST API call to get the ATS instances registered with Ambari).
To delete the bad ATS instance from Ambari, you can issue the following API call:
curl -u admin:admin -k -H "X-Requested-By: ambari" -X DELETE https://localhost:8443/api/v1/clusters/ROGERGPFS/hosts/<hostname-with-bad-ATS>/host_components/APP_T...
Created 01-20-2017 09:31 PM
Remember, you asked for it. 🙂
20 Jan 2017 15:27:44,017 ERROR [qtp-client-37739] AbstractResourceProvider:338 - Caught AmbariException when modifying a resource
org.apache.ambari.server.AmbariException: Host Component cannot be removed, clusterName=ROGERGPFS, serviceName=YARN, componentName=APP_TIMELINE_SERVER, hostname=cg-hm09.ncsa.illinois.edu, request={ clusterName=ROGERGPFS, serviceName=YARN, componentName=APP_TIMELINE_SERVER, hostname=cg-hm09.ncsa.illinois.edu, desiredState=null, state=null, desiredStackId=null, staleConfig=null, adminState=null}
at org.apache.ambari.server.controller.AmbariManagementControllerImpl.deleteHostComponents(AmbariManagementControllerImpl.java:2731)
at org.apache.ambari.server.controller.internal.HostComponentResourceProvider$3.invoke(HostComponentResourceProvider.java:321)
at org.apache.ambari.server.controller.internal.HostComponentResourceProvider$3.invoke(HostComponentResourceProvider.java:318)
at org.apache.ambari.server.controller.internal.AbstractResourceProvider.modifyResources(AbstractResourceProvider.java:331)
at org.apache.ambari.server.controller.internal.HostComponentResourceProvider.deleteResources(HostComponentResourceProvider.java:318)
at org.apache.ambari.server.controller.internal.ClusterControllerImpl.deleteResources(ClusterControllerImpl.java:330)
at org.apache.ambari.server.api.services.persistence.PersistenceManagerImpl.delete(PersistenceManagerImpl.java:111)
at org.apache.ambari.server.api.handlers.DeleteHandler.persist(DeleteHandler.java:44)
at org.apache.ambari.server.api.handlers.BaseManagementHandler.handleRequest(BaseManagementHandler.java:72)
at org.apache.ambari.server.api.services.BaseRequest.process(BaseRequest.java:135)
at org.apache.ambari.server.api.services.BaseService.handleRequest(BaseService.java:105)
at org.apache.ambari.server.api.services.BaseService.handleRequest(BaseService.java:74)
at org.apache.ambari.server.api.services.HostComponentService.deleteHostComponent(HostComponentService.java:203)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.SubLocatorRule.accept(SubLocatorRule.java:137)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.SubLocatorRule.accept(SubLocatorRule.java:137)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409)
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:540)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:715)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:770)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1496)
{continued}
Created 01-20-2017 11:37 PM
Ok, the ATS instance that you are trying to delete is in one of the states that makes it non-deletable.
Can you get me the output of:
curl -uadmin:PW -k https://localhost:8443/api/v1/clusters/ROGERGPFS/hosts/cg-hm09.ncsa.illinois.edu/host_components/APP...
Created 01-23-2017 06:54 PM
Ok, you just need to issue Stop on that ATS instance on cg-hm09. Please go to Hosts -> cg-hm09 and choose Stop on App Timeline Server from the component list. Then try the delete API call again.
Created 01-23-2017 08:00 PM
Did the unwanted alerts disappear too?
Created 01-23-2017 07:05 PM
Yea! That worked. Thank you.
Created 01-23-2017 03:48 PM
{ "href" : "https://localhost:8443/api/v1/clusters/ROGERGPFS/hosts/cg-hm09.ncsa.illinois.edu/host_components/APP_TIMELINE_SERVER", "HostRoles" : { "cluster_name" : "ROGERGPFS", "component_name" : "APP_TIMELINE_SERVER", "desired_stack_id" : "HDP-2.3", "desired_state" : "STARTED", "hdp_version" : "HDP-2.3.2.0-2602", "host_name" : "cg-hm09.ncsa.illinois.edu", "maintenance_state" : "ON", "service_name" : "YARN", "stack_id" : "HDP-2.3", "stale_configs" : false, "state" : "STARTED", "upgrade_state" : "NONE", "actual_configs" : { "accumulo-env" : { "default" : "version1460667914523" }, "accumulo-log4j" : { "default" : "version1460667914523" }, "accumulo-site" : { "default" : "version1460667914523" }, "ams-env" : { "default" : "version1" }, "ams-hbase-env" : { "default" : "version1" }, "ams-hbase-log4j" : { "default" : "version1" }, "ams-hbase-policy" : { "default" : "version1" }, "ams-hbase-security-site" : { "default" : "version1" }, "ams-hbase-site" : { "default" : "version1467740762357" }, "ams-log4j" : { "default" : "version1" }, "ams-site" : { "default" : "version1467740762357" }, "capacity-scheduler" : { "default" : "version1" }, "client" : { "default" : "version1460667914523" }, "cluster-env" : { "default" : "version1" }, "core-site" : { "default" : "version1484279982036" }, "falcon-env" : { "default" : "version1" }, "falcon-runtime.properties" : { "default" : "version1" }, "falcon-startup.properties" : { "default" : "version1" }, "gateway-log4j" : { "default" : "version1" }, "gateway-site" : { "default" : "version1" }, "hadoop-env" : { "default" : "version1" }, "hadoop-policy" : { "default" : "version1" }, "hbase-env" : { "default" : "version1" }, "hbase-log4j" : { "default" : "version1" }, "hbase-policy" : { "default" : "version1" }, "hbase-site" : { "default" : "version1" }, "hcat-env" : { "default" : "version1" }, "hdfs-log4j" : { "default" : "version1" }, "hdfs-site" : { "default" : "version1484262327604" }, "hive-env" : { "default" : "version1484280941838" }, "hive-exec-log4j" : { "default" : "version1" }, "hive-log4j" : { "default" : "version1" }, "hive-site" : { "default" : "version1484280941838" }, "hiveserver2-site" : { "default" : "version1" }, "knox-env" : { "default" : "version1" }, "ldap-log4j" : { "default" : "version1" }, "mapred-env" : { "default" : "version1" }, "mapred-site" : { "default" : "version1" }, "oozie-env" : { "default" : "version1442008913821" }, "oozie-log4j" : { "default" : "version1" }, "oozie-site" : { "default" : "version1484271458416" }, "pig-env" : { "default" : "version1" }, "pig-log4j" : { "default" : "version1" }, "pig-properties" : { "default" : "version1" }, "ranger-hbase-audit" : { "default" : "version1" }, "ranger-hbase-plugin-properties" : { "default" : "version1" }, "ranger-hbase-policymgr-ssl" : { "default" : "version1" }, "ranger-hbase-security" : { "default" : "version1" }, "ranger-hdfs-audit" : { "default" : "version1" }, "ranger-hdfs-plugin-properties" : { "default" : "version1" }, "ranger-hdfs-policymgr-ssl" : { "default" : "version1" }, "ranger-hdfs-security" : { "default" : "version1" }, "ranger-hive-audit" : { "default" : "version1" }, "ranger-hive-plugin-properties" : { "default" : "version1" }, "ranger-hive-policymgr-ssl" : { "default" : "version1" }, "ranger-hive-security" : { "default" : "version1" }, "ranger-knox-audit" : { "default" : "version1" }, "ranger-knox-plugin-properties" : { "default" : "version1" }, "ranger-knox-policymgr-ssl" : { "default" : "version1" }, "ranger-knox-security" : { "default" : "version1" }, "ranger-yarn-audit" : { "default" : "version1" }, "ranger-yarn-plugin-properties" : { "default" : "version1" }, "ranger-yarn-policymgr-ssl" : { "default" : "version1" }, "ranger-yarn-security" : { "default" : "version1" }, "spark-defaults" : { "default" : "version1" }, "spark-env" : { "default" : "version1" }, "spark-javaopts-properties" : { "default" : "version1" }, "spark-log4j-properties" : { "default" : "version1" }, "spark-metrics-properties" : { "default" : "version1" }, "sqoop-env" : { "default" : "version1" }, "ssl-client" : { "default" : "version1" }, "ssl-server" : { "default" : "version1" }, "tez-env" : { "default" : "version1" }, "tez-site" : { "default" : "version1" }, "topology" : { "default" : "version1" }, "users-ldif" : { "default" : "version1" }, "webhcat-env" : { "default" : "version1" }, "webhcat-log4j" : { "default" : "version1" }, "webhcat-site" : { "default" : "version1484280941838" }, "yarn-env" : { "default" : "version1484269362684" }, "yarn-log4j" : { "default" : "version1" }, "yarn-site" : { "default" : "version1484278518158" }, "zoo.cfg" : { "default" : "version1" }, "zookeeper-env" : { "default" : "version1" }, "zookeeper-log4j" : { "default" : "version1" } } }, "host" : { "href" : "https://localhost:8443/api/v1/clusters/ROGERGPFS/hosts/cg-hm09.ncsa.illinois.edu" }, "component" : [ { "href" : "https://localhost:8443/api/v1/clusters/ROGERGPFS/services/YARN/components/APP_TIMELINE_SERVER", "ServiceComponentInfo" : { "cluster_name" : "ROGERGPFS", "component_name" : "APP_TIMELINE_SERVER", "service_name" : "YARN" } } ], "processes" : [ ] }
Created 01-23-2017 09:20 PM
@yusaku, yes, they did.
At some point, when it happens again, I may start another thread regarding why my ambari metrics server keeps dying. Its fairly annoying.
Created 01-23-2017 09:36 PM
Awesome. If you could "accept" my answer, that would be great. AMS crashing issue might be due to the version of Ambari you are using (2.1.0 is quite old and there have been numerous stability improvements on AMS since then). If it is possible, I highly recommend you upgrade to Ambari 2.4.2.