Support Questions
Find answers, ask questions, and share your expertise

Ambari : Unable to start any service after successful working four days.

Solved Go to solution

Ambari : Unable to start any service after successful working four days.

Contributor

Hi , I Create a cluster using Ambari 2.2.2 on centos 7.2. It worked for about four days and I was able to ingest the data using Flume. All of sudden, I am not able to start any service using Ambari. The background operations with progress bar is not appearing and I am seeing the following exception in Ambari server log.

03 Aug 2016 17:15:53,863 ERROR [pool-9-thread-256] BaseProvider:240 - Caught exception getting JMX metrics : Connection refused, skipping same exceptions for next 5 minutes java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at sun.net.NetworkClient.doConnect(NetworkClient.java:175) at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) at sun.net.www.http.HttpClient.<init>(HttpClient.java:211) at sun.net.www.http.HttpClient.New(HttpClient.java:308) at sun.net.www.http.HttpClient.New(HttpClient.java:326) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1168) at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1104) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:998) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:932) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1512) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1440) at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) at org.apache.ambari.server.controller.internal.URLStreamProvider.processURL(URLStreamProvider.java:209) at org.apache.ambari.server.controller.internal.URLStreamProvider.processURL(URLStreamProvider.java:133) at org.apache.ambari.server.controller.internal.URLStreamProvider.readFrom(URLStreamProvider.java:107) at org.apache.ambari.server.controller.internal.URLStreamProvider.readFrom(URLStreamProvider.java:112) at org.apache.ambari.server.controller.jmx.JMXPropertyProvider.populateResource(JMXPropertyProvider.java:212) at org.apache.ambari.server.controller.metrics.ThreadPoolEnabledPropertyProvider$1.call(ThreadPoolEnabledPropertyProvider.java:180) at org.apache.ambari.server.controller.metrics.ThreadPoolEnabledPropertyProvider$1.call(ThreadPoolEnabledPropertyProvider.java:178) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 03 Aug 2016 17:16:09,289 INFO [qtp-ambari-client-445] RequestScheduleR

I did the following:

a) Removed the content /var/lib/ambari-agent/data and restarted all ambari-agents

b) restarted the Ambari-server.

I really appreciate your help.

Thanks

Ram

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Ambari : Unable to start any service after successful working four days.

Contributor

Hi All,

I would like post the solution that worked for me.

I deleted data from the following tables from Ambari database.

a) request

b) stage

c) host_role_command

d) execution_command

e) requestoperationlevel

f) requestresourcefilter

Thank you for your help

Thanks

Ram

View solution in original post

5 REPLIES 5

Re: Ambari : Unable to start any service after successful working four days.

@Ramakrishna Pratapa

The best is to start by looking at /var/logs and grep for ERROR. Start with ambari-server logs.

It is probably a workaround, but login to Ambari server and check ssh to all nodes. If ssh works,then put all your hosts on maintenance via Ambari and restart all your nodes (shutdown -r now). Stop your ambari service and reboot that server again. After all your servers are up, start ambari server and also the agents on all nodes and try to restart all services.

You should still check the logs for ERROR (especially the section "caused by") and if you publish we can analyze them together.

if this helps, pls vote/accept answer.

Re: Ambari : Unable to start any service after successful working four days.

Contributor

Thank you for your reply. The tried the above

a) /var/logs and grep for ERROR

I have identified in ambari agent logs

ERROR 2016-08-03 16:19:01,144 Controller.py:350 - Connection to hdp-cent7-01 was lost (details=Request to https://hdp-cent7-01:8441/agent/v1/heartbeat/hdp-cent7-02 failed due to Error occured during connecting to the server: ('The read operation timed out',)) ERROR 2016-08-03 16:20:27,315 Controller.py:350 - Connection to hdp-cent7-01 was lost (details=Request to https://hdp-cent7-01:8441/agent/v1/heartbeat/hdp-cent7-02 failed due to Error occured during connecting to the server: ('The read operation timed out',))

based on the above, I followed the following article

https://community.hortonworks.com/articles/49075/heartbeat-lost-due-to-ambari-agent-error-unable-to....

un-installed ambari-agent as well as ambar-server, reinstalled again.

However, it is not working and I noticed the following error in the ambari server log.

04 Aug 2016 17:21:07,213 WARN [C3P0PooledConnectionPoolManager[identityToken->2w0zzb9io96x8a18kxg2w|3fc2959f]-HelperThread-#2] BasicResourcePool:223 - com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@1d91d05d -- Acquisition Attempt Failed!!! Clearing pending acquires. While trying to acquire a needed new resource, we failed to succeed more than the maximum number of allowed acquisition attempts (30). Last acquisition attempt exception: com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Data source rejected establishment of connection, message from server: "Too many connections" at sun.reflect.GeneratedConstructorAccessor174.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at com.mysql.jdbc.Util.handleNewInstance(Util.java:411) at com.mysql.jdbc.Util.getInstance(Util.java:386) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1015) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:989) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:975) at com.mysql.jdbc.MysqlIO.doHandshake(MysqlIO.java:1112) at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2488) at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2521) at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2306) at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:839) at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:49) at sun.reflect.GeneratedConstructorAccessor171.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at com.mysql.jdbc.Util.handleNewInstance(Util.java:411) at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:421) at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:350) at com.mchange.v2.c3p0.DriverManagerDataSource.getConnection(DriverManagerDataSource.java:175) at com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:220) at com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:206) at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.acquireResource(C3P0PooledConnectionPool.java:203) at com.mchange.v2.resourcepool.BasicResourcePool.doAcquire(BasicResourcePool.java:1138) at com.mchange.v2.resourcepool.BasicResourcePool.doAcquireAndDecrementPendingAcquiresWithinLockOnSuccess(BasicResourcePool.java:1125) at com.mchange.v2.resourcepool.BasicResourcePool.access$700(BasicResourcePool.java:44) at com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask.run(BasicResourcePool.java:1870) at com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:696)

b) I tested SSH to all nodes from Ambari server and I did not find any issues.

Here is the final error I am seeing

WARN [qtp-ambari-client-29] ServletHandler:563 - /api/v1/clusters/txhubdevcluster01/hosts/hdp-cent7-03.rd.allscripts.com/host_components/FLUME_HANDLER javax.persistence.RollbackException: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.6.2.v20151217-774c696): org.eclipse.persistence.exceptions.DatabaseException Internal Exception: com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: Cannot add or update a child row: a foreign key constraint fails (Unknown error code)

Thanks

Ram

Re: Ambari : Unable to start any service after successful working four days.

Have you change anything. Check firewall and selinux. check host file and ntp service. If ntp is not started plz start it. And check if passwordless ssh is working or not.

Paste logs here:-

log location /var/log/ambari-server/ and /var/log/ambari-agent.

Re: Ambari : Unable to start any service after successful working four days.

Contributor

Sharma,

Thank you for your help. Here is the error from Ambari log

04 Aug 2016 17:20:36,026 ERROR [pool-9-thread-9] BaseProvider:240 - Caught exception getting JMX metrics : Connection refused, skipping same exceptions for next 5 minutes

One of the agent log has the following error

ERROR 2016-08-04 16:21:36,953 HostInfo.py:229 - Checking java processes failed

Please let me know if you need more information.

Thank you

Ram

Re: Ambari : Unable to start any service after successful working four days.

Contributor

Hi All,

I would like post the solution that worked for me.

I deleted data from the following tables from Ambari database.

a) request

b) stage

c) host_role_command

d) execution_command

e) requestoperationlevel

f) requestresourcefilter

Thank you for your help

Thanks

Ram

View solution in original post