Created 08-28-2018 01:38 PM
Hi
could you please help me, I am new in ambari metrics, my problem is after ambari upgrade, and ambari metrics ugrade, the ambari-metrics-collector going down after starte it, below the message error from log
INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server name_of_host/x.x.x.x:61181. Will not attempt to authenticate using SASL (unknown error)
WARN org.apache.zookeeper.ClientCnxn: Session 0x65464619r4fd7e for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=name_of_host:61181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /ams-hbase-secure/meta-region-server
Created 08-31-2018 04:58 PM
Hi @lam rab ,
By the error it looks like zookeeper is having some issue and its not able to connect to zookeeper.
If the AMS metrics history data is not important for you and you need to make the service up.
can you try performing : https://cwiki.apache.org/confluence/display/AMBARI/Cleaning+up+Ambari+Metrics+System+Data
Remove the AMS zookeeper data by backing up and removing the contents of 'hbase.tmp.dir'/zookeeper
and see if this helps ?
Also Please make sure AMS Heap configurations are good : https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.2.0/bk_ambari-operations/content/ams_general_gu...
Please accept my answer if you found this helpful.
Created 08-28-2018 02:17 PM
Hi @lam rab,
Have you done the mandatory post upgrade tasks related to ambari upgrade ?
I hope if you perform this command, the output will have every ambari versions same
rpm -qa |grep -i ambari
refer to this doc for more (this is for ambari-2.4.3 , choose for your version of ambari ) : https://docs.hortonworks.com/HDPDocuments/Ambari-2.4.3.0/bk_ambari-upgrade/content/upgrade_ambari_me...
Please mark answer as accepted if its helpful
Created 08-28-2018 03:03 PM
Hi @Akhil S Naik
Yes, i do all post upgrade tasks of ambari upgrade specialy for ambari-metrics,
below the output
[root@myhost ~]# rpm -qa |grep -i ambari
ambari-agent-2.6.2.2-1.x86_64
ambari-metrics-collector-2.6.2.2-1.x86_64
ambari-metrics-hadoop-sink-2.6.2.2-1.x86_64
ambari-metrics-monitor-2.6.2.2-1.x86_64
Created 08-29-2018 06:38 PM
Hi @lam rab
Is the issue resolved? If yes, please let me know how it was done.
Else, Is your cluster kerberized? Can you also add the hbase logs inside metrics collector?
Few attempts which i tried:
In ambari, goto the host where Metric collector is installed and refresh the configs and try again to restart metrics collector.
The issues which i have faced till, the issue is due to either the values stored in zkClient or something wrong in metric collector files stored on the hosts.
If you don't need the previous metrics stored, you can follow the below steps "at your own risk"
1. Stop all the services of metric collector, metric monitor and grafana.
2. Delete the service.
3. Rename/Delete the folder ambari-metrics-collector at path /var/log/var/lib/ and /var/var/lib/
4. Add the service Ambari Metrics from Ambari again.
The above worked for me.
Created 08-31-2018 10:37 AM
Hi @lam rab ,
were you able to resolve the issue. i see the exception you posted happens mostly due to upgrade only..
If not please attach ambari-metrics collector logs .
Created 08-31-2018 03:56 PM
Hi
I am not resolved to problem yet, here are the log of ambari-metrics et hbase-ams
thanks
ambari-metrics-collector log:
--------------------------------
2018-08-31 17:17:29,367 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server host102/x.x.x.x:61181. Will not attempt to authenticate using SASL (unknown error)
2018-08-31 17:17:29,367 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
2018-08-31 17:17:29,834 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server host102/x.x.x.x:61181. Will not attempt to authenticate using SASL (unknown error)
2018-08-31 17:17:29,834 WARN org.apache.zookeeper.ClientCnxn: Session 0x1659083ffc20001 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
2018-08-31 17:17:29,991 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server host102/x.x.x.x:61181. Will not attempt to authenticate using SASL (unknown error)
2018-08-31 17:17:29,991 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
2018-08-31 17:17:30,091 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=host102:61181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /ams-hbase-secure/meta-region-server
###################################################################
hbase-ams log
--------------------------------------------------
2018-08-31 17:08:13,773 INFO [main-SendThread(host102:61181)] zookeeper.ClientCnxn: Opening socket connection to server host102/x.x.x.x:61181. Will not attempt to authenticate using SASL (unknown error)
2018-08-31 17:08:13,773 WARN [main-SendThread(host102:61181)] zookeeper.ClientCnxn: Session 0x1659083ffc20002 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2018-08-31 17:08:14,944 INFO [RS:0;host102:50466-SendThread(host102:61181)] zookeeper.ClientCnxn: Opening socket connection to server host102/x.x.x.x:61181. Will not attempt to authenticate using SASL (unknown error)
2018-08-31 17:08:14,944 WARN [RS:0;host102:50466-SendThread(host102:61181)] zookeeper.ClientCnxn: Session 0x1659083ffc20004 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2018-08-31 17:08:15,536 INFO [main-SendThread(host102:61181)] zookeeper.ClientCnxn: Opening socket connection to server host102/x.x.x.x:61181. Will not attempt to authenticate using SASL (unknown error)
2018-08-31 17:08:15,536 WARN [main-SendThread(host102:61181)] zookeeper.ClientCnxn: Session 0x1659083ffc20002 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2018-08-31 17:08:16,228 ERROR [main] master.HMasterCommandLine: Master exiting
java.lang.RuntimeException: Master not initialized after 200000ms seconds
at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:230)
at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:445)
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:229)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:139)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2838)
Created 08-31-2018 04:58 PM
Hi @lam rab ,
By the error it looks like zookeeper is having some issue and its not able to connect to zookeeper.
If the AMS metrics history data is not important for you and you need to make the service up.
can you try performing : https://cwiki.apache.org/confluence/display/AMBARI/Cleaning+up+Ambari+Metrics+System+Data
Remove the AMS zookeeper data by backing up and removing the contents of 'hbase.tmp.dir'/zookeeper
and see if this helps ?
Also Please make sure AMS Heap configurations are good : https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.2.0/bk_ambari-operations/content/ams_general_gu...
Please accept my answer if you found this helpful.
Created 09-01-2018 07:19 AM
Cleaning up the AMS data would remove all the historical AMS data available
Step-by-step guide
1.Using Ambari
a.Set AMS to maintenance
b.Stop AMS from Ambari
c.Identify the following from the AMS Configs screen
i.'Metrics Service operation mode' (embedded or distributed)
ii.hbase.rootdir iii.hbase.zookeeper.property.dataDir
2.AMS data would be stored in 'hbase.rootdir' identified above. Backup and remove the AMS data.
a. If the Metrics Service operation mode
i.is 'embedded', then the data is stored in OS files.
Use regular OS commands to backup and remove the files in hbase.rootdir
ii.is 'distributed', then the data is stored in HDFS.
Use 'hdfs dfs ' commands to backup and remove the files in hbase.rootdir
3. Remove the AMS zookeeper data by backing up and removing the contents of 'hbase.tmp.dir'/zookeeper
4.Remove any Phoenix spool files from 'hbase.tmp.dir'/phoenix-spool folder
5.Restart AMS using Ambari
HTH
Created 09-03-2018 10:12 AM
Hi all,
Thanks for help, the problem is solved by cleaning up the AMS data. but i am still not understand why this happening after upgarde.
Regards
Created 09-03-2018 11:40 AM
Did you do the post upgrade steps mentionned in Migrate Ambari Metrics Data that could be probably the cause. Never ignore researching before any upgrade so as not to miss some post-upgrade Tasks.
Upgrades are NEVER all smooth otherwise no fun 🙂
Please accept the answer to close the thread.