Created 12-14-2016 01:56 PM
Hi
I installed HDP with 1 Ambari server and 2 nodes. After all VM restart I restarted ambari-server and ambari-agent (each nodes)
I logged to Ambari UI but I have a status "datanodes 0/2 started"
Thanks for your help
Created 12-15-2016 05:29 PM
Hi
Issue resolved. I share the solution ...
The pb was due to missconfiguration on datanode /etc/hosts after reboot. node2 cannot connect to node1 via FQDN
It's an important point (not clear in the doc). All nodes ambari server and data must be able to connect each other
BR
Maxime
Created 12-14-2016 02:35 PM
Did you restart all services? If so, you'll need to look at the logs to see what's going on. Start by looking at the logs in Ambari from the start commands for each service. If that doesn't give answers, you may need to look at the logs on one of the data nodes.
Created 12-14-2016 02:58 PM
@Maxime Savary - Once ambari-server is up, and ambari-agent is up on each node, in Ambari s you can stop and start the cluster using the Action button below the list of services. https://docs.hortonworks.com/HDPDocuments/Ambari-2.1.0.0/bk_Ambari_Users_Guide/content/_starting_and...
If this does not work, you need to look at the logs. You can post logs here and we will try to help.
Created 12-14-2016 04:29 PM
@Maxime Savary - the logs to start each service can be found by looking in Ambari "Ops" button near the top by the "Alerts" button. Drill down to the failed (red) action and eventually you will get to the logs. there are two logs for each operation in this window - stderr and stdout. You don't have to go to the file system. These are the ambari-agent logs for specific operations.
The other half of the equation is to look on the server where the service is starting. For example, look at /var/log/hadoop/hdfs/hadoop-hdfs-datanode.log or hadoop-hdfs-namenode.log. These files are going to be on whatever server the service is supposed to run on.
Please click Reply rather than entering a new answer when you are replying.
Created 12-14-2016 04:54 PM
Hi James
When I try to start services (all or one) from Ambari UI the stderr and stdout are empty. It seems that Ambari server is not connected to nodes agent
Maxime
Created 12-14-2016 05:00 PM
Hi Maxime, Did you start ambari-agent on each node? Ssh to each node as root (probably) or ambari (depending on how you set it up) and run ambari-agent status and/or ambari-agent-start. I'm pretty sure you will see an error if there is no communication with the agents. If an operation failed, you will see a red line. As you drill in, you will probably see a number of green lines, but you need to find the failed ones which are red and drill into them.
Created 12-14-2016 02:52 PM
Hi
I can't restart any services from Ambari UI ... The Ambari Agent is up and running on each node
On Vms restart what is the procedure to restart the cluster ? Do we have to start ambari server and ambari agent and then start services via Ambari UI
Thks
Maxime
Created 12-14-2016 03:03 PM
Regarding your Query: "On Vms restart what is the procedure to restart the cluster ? Do we have to start ambari server and ambari agent and then start services via Ambari UI"
By default when a system reboots the HDP components are not auto started So after system reboot you will need to manuallly restart them. Even though the Ambari Server & Ambari Agents will be autorestarted due to service configuration.
If you want to automatically restart the services (DN, NN ...etc) as well upon the System reboot/restart then you should refer to :
And also the following will be useful:
Regarding issue "I can't restart any services from Ambari UI ... The Ambari Agent is up and running on each node"
Can you please let us know what happens when you try to start services using ambari UI? Do you see any error/exception in the /var/log/ambari-server/ambari-server.log? (OR) The UI is not responding or Do you see any error in the UI itself?
Created 12-14-2016 04:17 PM
Many thanks for your feedback. Maybe I don't understand something
I'm not able to start services from Ambari UI and I don't have error trace in file
/var/log/ambari-server/ambari-server.log
So Do I have to restart serveices manually on each datanode. In my comprehension after restarting ambari server and ambari node agent (2 nodes) I should can start Services fom Ambari UI
BR
Maxime
Created 12-14-2016 08:52 PM
Hi
I made some check
1) ambari server is up and running
2) ambari agent are up and running on the 2 datanode (I checked the pid)
3) ambari agent can connect to ambari server (/etc/ambari-agent/conf/ambari-agent.ini is OK)
4) ambari server can connect on 2 datanode with ssh-key
ambari-agent(node-1).log : On the 2 nodes connection to ambari server seems OK. Error on service connection => the node try to connect himself on port 8042
INFO 2016-12-14 18:47:19,360 PingPortListener.py:50 - Ping port listener started on port: 8670INFO 2016-12-14 18:47:19,360 main.py:349 - Connecting to Ambari server at https://ambari.novalocal:8440 (192.168.0.7)INFO 2016-12-14 18:47:19,360 NetUtil.py:62 - Connecting to https://ambari.novalocal:8440/caINFO 2016-12-14 18:47:19,469 main.py:359 - Connected to Ambari server ambari.novalocal
ERROR 2016-12-14 19:00:19,561 script_alert.py:119 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://hdp-node1.mycluster.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen\n return _opener.open(url, data, timeout)\n File "/usr/lib/python2.7/urllib2.py", line 404, in open\n response = self._open(req, data)\n File "/usr/lib/python2.7/urllib2.py", line 422, in _open\n \'_open\', req)\n File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain\n result = func(*args)\n File "/usr/lib/python2.7/urllib2.py", line 1214, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open\n raise URLError(err)\nURLError: <urlopen error [Errno 111] Connection refused>\n)']
Maxime