Created 08-04-2016 03:29 AM
Hi , I Create a cluster using Ambari 2.2.2 on centos 7.2. It worked for about four days and I was able to ingest the data using Flume. All of sudden, I am not able to start any service using Ambari. The background operations with progress bar is not appearing and I am seeing the following exception in Ambari server log.
03 Aug 2016 17:15:53,863 ERROR [pool-9-thread-256] BaseProvider:240 - Caught exception getting JMX metrics : Connection refused, skipping same exceptions for next 5 minutes java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at sun.net.NetworkClient.doConnect(NetworkClient.java:175) at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) at sun.net.www.http.HttpClient.<init>(HttpClient.java:211) at sun.net.www.http.HttpClient.New(HttpClient.java:308) at sun.net.www.http.HttpClient.New(HttpClient.java:326) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1168) at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1104) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:998) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:932) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1512) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1440) at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) at org.apache.ambari.server.controller.internal.URLStreamProvider.processURL(URLStreamProvider.java:209) at org.apache.ambari.server.controller.internal.URLStreamProvider.processURL(URLStreamProvider.java:133) at org.apache.ambari.server.controller.internal.URLStreamProvider.readFrom(URLStreamProvider.java:107) at org.apache.ambari.server.controller.internal.URLStreamProvider.readFrom(URLStreamProvider.java:112) at org.apache.ambari.server.controller.jmx.JMXPropertyProvider.populateResource(JMXPropertyProvider.java:212) at org.apache.ambari.server.controller.metrics.ThreadPoolEnabledPropertyProvider$1.call(ThreadPoolEnabledPropertyProvider.java:180) at org.apache.ambari.server.controller.metrics.ThreadPoolEnabledPropertyProvider$1.call(ThreadPoolEnabledPropertyProvider.java:178) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 03 Aug 2016 17:16:09,289 INFO [qtp-ambari-client-445] RequestScheduleR
I did the following:
a) Removed the content /var/lib/ambari-agent/data and restarted all ambari-agents
b) restarted the Ambari-server.
I really appreciate your help.
Thanks
Ram
Created 08-08-2016 01:44 AM
Hi All,
I would like post the solution that worked for me.
I deleted data from the following tables from Ambari database.
a) request
b) stage
c) host_role_command
d) execution_command
e) requestoperationlevel
f) requestresourcefilter
Thank you for your help
Thanks
Ram
Created 08-04-2016 03:35 AM
The best is to start by looking at /var/logs and grep for ERROR. Start with ambari-server logs.
It is probably a workaround, but login to Ambari server and check ssh to all nodes. If ssh works,then put all your hosts on maintenance via Ambari and restart all your nodes (shutdown -r now). Stop your ambari service and reboot that server again. After all your servers are up, start ambari server and also the agents on all nodes and try to restart all services.
You should still check the logs for ERROR (especially the section "caused by") and if you publish we can analyze them together.
if this helps, pls vote/accept answer.
Created 08-04-2016 09:35 PM
Thank you for your reply. The tried the above
a) /var/logs and grep for ERROR
I have identified in ambari agent logs
ERROR 2016-08-03 16:19:01,144 Controller.py:350 - Connection to hdp-cent7-01 was lost (details=Request to https://hdp-cent7-01:8441/agent/v1/heartbeat/hdp-cent7-02 failed due to Error occured during connecting to the server: ('The read operation timed out',)) ERROR 2016-08-03 16:20:27,315 Controller.py:350 - Connection to hdp-cent7-01 was lost (details=Request to https://hdp-cent7-01:8441/agent/v1/heartbeat/hdp-cent7-02 failed due to Error occured during connecting to the server: ('The read operation timed out',))
based on the above, I followed the following article
un-installed ambari-agent as well as ambar-server, reinstalled again.
However, it is not working and I noticed the following error in the ambari server log.
04 Aug 2016 17:21:07,213 WARN [C3P0PooledConnectionPoolManager[identityToken->2w0zzb9io96x8a18kxg2w|3fc2959f]-HelperThread-#2] BasicResourcePool:223 - com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@1d91d05d -- Acquisition Attempt Failed!!! Clearing pending acquires. While trying to acquire a needed new resource, we failed to succeed more than the maximum number of allowed acquisition attempts (30). Last acquisition attempt exception: com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Data source rejected establishment of connection, message from server: "Too many connections" at sun.reflect.GeneratedConstructorAccessor174.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at com.mysql.jdbc.Util.handleNewInstance(Util.java:411) at com.mysql.jdbc.Util.getInstance(Util.java:386) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1015) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:989) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:975) at com.mysql.jdbc.MysqlIO.doHandshake(MysqlIO.java:1112) at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2488) at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2521) at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2306) at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:839) at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:49) at sun.reflect.GeneratedConstructorAccessor171.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at com.mysql.jdbc.Util.handleNewInstance(Util.java:411) at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:421) at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:350) at com.mchange.v2.c3p0.DriverManagerDataSource.getConnection(DriverManagerDataSource.java:175) at com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:220) at com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:206) at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.acquireResource(C3P0PooledConnectionPool.java:203) at com.mchange.v2.resourcepool.BasicResourcePool.doAcquire(BasicResourcePool.java:1138) at com.mchange.v2.resourcepool.BasicResourcePool.doAcquireAndDecrementPendingAcquiresWithinLockOnSuccess(BasicResourcePool.java:1125) at com.mchange.v2.resourcepool.BasicResourcePool.access$700(BasicResourcePool.java:44) at com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask.run(BasicResourcePool.java:1870) at com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:696)
b) I tested SSH to all nodes from Ambari server and I did not find any issues.
Here is the final error I am seeing
WARN [qtp-ambari-client-29] ServletHandler:563 - /api/v1/clusters/txhubdevcluster01/hosts/hdp-cent7-03.rd.allscripts.com/host_components/FLUME_HANDLER javax.persistence.RollbackException: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.6.2.v20151217-774c696): org.eclipse.persistence.exceptions.DatabaseException Internal Exception: com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: Cannot add or update a child row: a foreign key constraint fails (Unknown error code)
Thanks
Ram
Created 08-04-2016 11:30 AM
Have you change anything. Check firewall and selinux. check host file and ntp service. If ntp is not started plz start it. And check if passwordless ssh is working or not.
Paste logs here:-
log location /var/log/ambari-server/ and /var/log/ambari-agent.
Created 08-04-2016 09:38 PM
Sharma,
Thank you for your help. Here is the error from Ambari log
04 Aug 2016 17:20:36,026 ERROR [pool-9-thread-9] BaseProvider:240 - Caught exception getting JMX metrics : Connection refused, skipping same exceptions for next 5 minutes
One of the agent log has the following error
ERROR 2016-08-04 16:21:36,953 HostInfo.py:229 - Checking java processes failed
Please let me know if you need more information.
Thank you
Ram
Created 08-08-2016 01:44 AM
Hi All,
I would like post the solution that worked for me.
I deleted data from the following tables from Ambari database.
a) request
b) stage
c) host_role_command
d) execution_command
e) requestoperationlevel
f) requestresourcefilter
Thank you for your help
Thanks
Ram