Member since
01-08-2016
22
Posts
6
Kudos Received
0
Solutions
02-17-2016
09:08 PM
Thanks Jonas, I am in the process of trying embedded mode again per the 200-400 node guideline along with Swagle's recommendation of increasing the regionserver heap size. I will bookmark your response though if we increase the size of the cluster to require distributed mode. When I had tried distributed mode earlier I did not change hbase.zookeeper.quorum or zookeper.znode.parent so that could have been it. Thanks again though for taking the time 😃
... View more
02-17-2016
06:27 PM
1 Kudo
Thanks Swagle for the helpful info!
... View more
02-17-2016
04:31 PM
1 Kudo
Thank you for the link, specifically I saw the outline under the General Guidelines section which I didn't see before. Our cluster is 300-400 nodes so I will leave the collector in embedded mode and reconfigure it per the settings below and hopefully the collector will stop failing =)
Production 200-400 200GB embedded n.a. metrics_collector_heap_size=2048 hbase_regionserver_heapsize=2048 hbase_master_heapsize=2048 hbase_master_xmn_size=512
... View more
02-17-2016
01:17 AM
1 Kudo
I have tried following the instructions in the link below which saves just fine. But when I go to start the Metrics Collector, it will look like it is started but then will show as being in a stopped state.
https://cwiki.apache.org/confluence/display/AMBARI/AMS+-+distributed+mode I changed the hbase.zookeeper.property.clientPort property to 2181 as in the doc as well but I noticed in the log it is showing the old port in the socket connection with the following line saying "session... for server null" In the web interface however, I get the following errors: Metrics Collector - ZooKeeper Server ProcessConnection failed: [Errno 111] Connection refused to r9-01.maas:2181Metrics Collector - HBase Master ProcessConnection failed: [Errno 111] Connection refused to r9-01.maas:61310 Here is a little snippet from the /var/log/ambari-metrics-collector.log file. The rest of the log seemed to repeat the same messages. 2016-02-16 16:04:31,425 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:61181. Will not attempt to authenticate using SASL (unknown error)
2016-02-16 16:04:31,426 WARN org.apache.zookeeper.ClientCnxn: Session 0x152e783fe0d0001 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2016-02-16 16:04:31,526 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=localhost:61181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/meta-region-server
... View more
Labels:
- Labels:
-
Apache Ambari
01-31-2016
03:01 AM
The next time I reboot it I will log better information and see if it reoccurs for better RCA. Thanks for all the help!
... View more
01-31-2016
02:56 AM
It is non-production in a research lab. Haven't signed up for support yet either.
... View more
01-31-2016
02:47 AM
1 Kudo
Unfortunately it is not a UI issue as the web server doesn't appear to be running as port 8080 never opens to a listening state. I didn't see anything in the troubleshooting guides aside from trying to restart the ambari-server which hasn't helped. Also this was deployed using the latest Ambari 2.2 and the HDP 2.3 stack. I originally installed the ambari server on a vm so that I could checkpoint it and luckily I had recently taken one and was able to roll back successfully without too many errors.
... View more
01-31-2016
01:51 AM
Didnt help =(
... View more
01-31-2016
01:45 AM
Still nothing in ambari-server log, but the agent logs are showing connection refused to ambariserver:8440
... View more
01-31-2016
01:33 AM
I ran: sudo ambari-server stop sudo ambari-server start -v -g But no errors were displayed and the ambari-server.log was the same. On one of the nodes, I restarted the agent and checked the log and it is indicating: Failed to connect to AmbariController:8440/ca due to [Errorno 111] Connection refused
... View more
01-30-2016
09:54 PM
Thats the problem, netstat shows nothing for p 8080.
... View more
01-30-2016
09:52 PM
When I deployed ambari/hadoop initially, I left most all of the settings as default with the intent to adjust them later as our team would like if we continue to use the product. I don't remember explicitly enabling it, so I imagine ssl is not enabled.
... View more
01-30-2016
09:50 PM
The log file looks the same. I tried defining a different port which didnt work, then I left it defined in ambari.properties as 8080 as the link you provided said but it still isnt working. Is there a problem with using port 8080?
... View more
01-30-2016
09:01 PM
1 Kudo
ambariproperties.txt
... View more
01-30-2016
08:47 PM
There weren't any errors that I saw. Just warning messages about one of the nodes losing it's heartbeat. tail -fn 100 /var/log/ambari-server/ambari-server.log >> ambari-server-log-restart.txt
... View more
01-30-2016
08:07 PM
I had tried that initially, but no luck
... View more
01-30-2016
08:06 PM
ambari-server status shows that it is running and the port 8080 does not appear to be in use.
... View more
01-30-2016
07:34 PM
Thanks for the quick reply, it looks like the ambari process isnt listening on 8080.
... View more
01-30-2016
07:48 AM
I have a cluster that has been running pretty smoothly for over a month now that is rather large. I had just fixed an issue with a few DataNodes and was left with the web interface saying that 5 hosts were in maintenance mode when they were not. I tried turning mm on/off but it didn't help so I finally tried a sudo ambari-server restart Afterwards, I tried to login again and the web interface wouldn't load. Waited a few minutes, still nada. Looking at today's entries in /var/log/ambari-server/ambari-server.log had errors regarding the metric server but nothing that seemed relevant. The network settings are all the same and haven't changed, I am not sure what went wrong? Any ideas of what logs I should read or what I should try? EDIT: Thanks for the fast replies!! I have attached the info requested as files so this post doesn't extend down a mile. It looks like the ambari service is not listening on port 8080. EDIT2: I ended up rolling back the ambari server vm to an earlier checkpoint and luckily there weren't too many errors to deal with. I will update again if the issue comes back.
... View more
Labels:
- Labels:
-
Apache Ambari
01-09-2016
12:23 AM
I tried manually installing an agent on one node Registering with the server...
Registration with the server failed.
... View more
01-08-2016
11:37 PM
1 Kudo
Hadoop can run on Ubuntu15, but since we have a large cluster (>300 nodes) deploying Hadoop by hand would be a bit more daunting than deploying it with Ambari. I could always downgrade to Ubuntu14, but before I do, I wanted to see if it was possible to use Ubuntu15 first.
... View more
01-08-2016
11:02 PM
I am trying to test Hadoop using Ubuntu 15 and also wanted to use Ambari for the deployment. When I try to auto-install the agent in the initial server install wizard, it gets all the way to the end then fails saying "Unsupported OS." The agent appears to have installed correctly though, it just fails simply because the OS is identified as Ubuntu 15. One workaround I read was to modify the OS identifier on each host to read Ubuntu 14 instead of 15. Since the Ambari server is also running on Ubuntu 15 though, this generates a new error saying that the host is not in the same OS family as the server. Is there a file where I can add Ubuntu 15 to the list of supported OS's? Or can I disable this check? Or is this enabled because there is no chance Ambari will run on Ubuntu 15?
... View more
Labels:
- Labels:
-
Apache Ambari