Member since
01-08-2016
22
Posts
6
Kudos Received
0
Solutions
02-17-2016
09:08 PM
Thanks Jonas, I am in the process of trying embedded mode again per the 200-400 node guideline along with Swagle's recommendation of increasing the regionserver heap size. I will bookmark your response though if we increase the size of the cluster to require distributed mode. When I had tried distributed mode earlier I did not change hbase.zookeeper.quorum or zookeper.znode.parent so that could have been it. Thanks again though for taking the time 😃
... View more
02-17-2016
06:27 PM
1 Kudo
Thanks Swagle for the helpful info!
... View more
02-17-2016
04:31 PM
1 Kudo
Thank you for the link, specifically I saw the outline under the General Guidelines section which I didn't see before. Our cluster is 300-400 nodes so I will leave the collector in embedded mode and reconfigure it per the settings below and hopefully the collector will stop failing =)
Production 200-400 200GB embedded n.a. metrics_collector_heap_size=2048 hbase_regionserver_heapsize=2048 hbase_master_heapsize=2048 hbase_master_xmn_size=512
... View more
02-17-2016
01:17 AM
1 Kudo
I have tried following the instructions in the link below which saves just fine. But when I go to start the Metrics Collector, it will look like it is started but then will show as being in a stopped state.
https://cwiki.apache.org/confluence/display/AMBARI/AMS+-+distributed+mode I changed the hbase.zookeeper.property.clientPort property to 2181 as in the doc as well but I noticed in the log it is showing the old port in the socket connection with the following line saying "session... for server null" In the web interface however, I get the following errors: Metrics Collector - ZooKeeper Server ProcessConnection failed: [Errno 111] Connection refused to r9-01.maas:2181Metrics Collector - HBase Master ProcessConnection failed: [Errno 111] Connection refused to r9-01.maas:61310 Here is a little snippet from the /var/log/ambari-metrics-collector.log file. The rest of the log seemed to repeat the same messages. 2016-02-16 16:04:31,425 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:61181. Will not attempt to authenticate using SASL (unknown error)
2016-02-16 16:04:31,426 WARN org.apache.zookeeper.ClientCnxn: Session 0x152e783fe0d0001 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2016-02-16 16:04:31,526 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=localhost:61181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/meta-region-server
... View more
Labels:
- Labels:
-
Apache Ambari
01-31-2016
03:01 AM
The next time I reboot it I will log better information and see if it reoccurs for better RCA. Thanks for all the help!
... View more
01-31-2016
02:56 AM
It is non-production in a research lab. Haven't signed up for support yet either.
... View more
01-31-2016
02:47 AM
1 Kudo
Unfortunately it is not a UI issue as the web server doesn't appear to be running as port 8080 never opens to a listening state. I didn't see anything in the troubleshooting guides aside from trying to restart the ambari-server which hasn't helped. Also this was deployed using the latest Ambari 2.2 and the HDP 2.3 stack. I originally installed the ambari server on a vm so that I could checkpoint it and luckily I had recently taken one and was able to roll back successfully without too many errors.
... View more
01-31-2016
01:51 AM
Didnt help =(
... View more
01-31-2016
01:45 AM
Still nothing in ambari-server log, but the agent logs are showing connection refused to ambariserver:8440
... View more
01-31-2016
01:33 AM
I ran: sudo ambari-server stop sudo ambari-server start -v -g But no errors were displayed and the ambari-server.log was the same. On one of the nodes, I restarted the agent and checked the log and it is indicating: Failed to connect to AmbariController:8440/ca due to [Errorno 111] Connection refused
... View more