Created 03-07-2018 08:49 AM
I add services on ambari,everything works fine up to this point,after which it remains at "Preparing to Deploy: 5 of 11 tasks completed."forevery. and ambari-server.log no obvious errors.
I'm sure hosts iptables selinux and the time has been synced.because other services have been installed successful.
By the way,there were error in hosts and some nodes hostname,but it's all right now.
So is such a problem a database?
Created 03-14-2018 05:37 AM
Can you help me?This question has been bothering me for a long time.
Created 03-14-2018 05:49 AM
Can you please let us know the following:
1. What is the exact ambari version?
2. Do you see any error/warning in your "/var/log/ambari-server/ambari-server.log" ?
3. Do you see any error/warning in the "/var/log/ambari-agent/ambari-agent.log" of any of the cluster host?
4. Do you see any error in the Ambari UI when you open the browser Developer Tool
Like Google Chrome --> Click on the "More Tools" --> Developers Tool --> Console Tab
Then in that browser open the Ambari url which is hanging.
Do you notice any error in the Browser debugger console?
Created 03-14-2018 05:53 AM
Created 03-14-2018 07:15 AM
I run yum clean all and yum repolist,yum is all right
Created 03-14-2018 07:11 AM
1.ambari-server version is: 2.4.1.0-22 HDP version: HDP-2.5
2.I executed ambari-server stop and ambari-server start ;I found some error in /var/log/ambari-server/ambari-server.log:
14 Mar 2018 14:06:11,988 ERROR [qtp-ambari-agent-64] HeartBeatHandler:200 - CurrentResponseId unknown for worker7.hadoop.ds.com - send register command
Other host in the cluster has this error. include worker6.hadoop.com worker5.hadoop.com worker4.hadoop.com
3.restarted the ambari-server,other host log:/var/log/ambari-server/ambari-agent.log:
25375 ERROR 2018-03-14 09:16:33,391 Controller.py:415 - Connection to master1.hadoop.ds.com was lost (details=Request to https://master1.hadoop.ds.com:8441/agent/v1/heartbeat/edge1.hadoop.ds.com failed due to Error occured during connecting to the server: ) 25376 INFO 2018-03-14 09:17:10,309 NetUtil.py:62 - Connecting to https://master1.hadoop.ds.com:8440/connection_info 25377 WARNING 2018-03-14 09:17:10,310 NetUtil.py:93 - Failed to connect to https://master1.hadoop.ds.com:8440/connection_info due to [Errno 111] Connection refused 25378 INFO 2018-03-14 09:17:10,310 security.py:100 - SSL Connect being called.. connecting to the server 25379 ERROR 2018-03-14 09:17:10,310 Controller.py:415 - Unable to reconnect to https://master1.hadoop.ds.com:8441/agent/v1/heartbeat/edge1.hadoop.ds.com (attempts=1, de tails=Request to https://master1.hadoop.ds.com:8441/agent/v1/heartbeat/edge1.hadoop.ds.com failed due to [Errno 111] Connection refused) 25380 INFO 2018-03-14 09:17:24,215 NetUtil.py:62 - Connecting to https://master1.hadoop.ds.com:8440/connection_info 25381 WARNING 2018-03-14 09:17:24,215 NetUtil.py:93 - Failed to connect to https://master1.hadoop.ds.com:8440/connection_info due to [Errno 111] Connection refused 25382 INFO 2018-03-14 09:17:24,216 security.py:100 - SSL Connect being called.. connecting to the server 25383 ERROR 2018-03-14 09:17:24,216 Controller.py:415 - Unable to reconnect to https://master1.hadoop.ds.com:8441/agent/v1/heartbeat/edge1.hadoop.ds.com (attempts=2, details=Request to https://master1.hadoop.ds.com:8441/agent/v1/heartbeat/edge1.hadoop.ds.com failed due to [Errno 111] Connection refused)
Created on 03-14-2018 07:20 AM - edited 08-18-2019 03:06 AM
Firefox:
when stuck at "Preparing to Deploy: 5 of 11 tasks completed":Click F12 networke like this:
Created 03-14-2018 07:21 AM
As the error says an established connection was lost ( indicates a temporary n/w issue)
Connection to master1.hadoop.ds.com was lost (details=Request to https://master1.hadoop.ds.com:8441/agent/v1/heartbeat/edge1.hadoop.ds.com failed due to Error occured during connecting to the server: )
Means there might be a temporary Network / Firewall issue (packet drop) or the network is not consistent. So in between there is a connection loss when ambari was trying to perform the "Add Service" operation.
It may be a temoporary n/w issue which might be fixed by now so can you please try cancelling the "Add Service" wizard and then re run it to see if it works.
And in order to veryfy of the ambari agents are actually able to communicate with ambari server on SSL port or not then you can try running the following queries from trhe agent machines couple of times to see if the communication is OK.
# openssl s_client -connect master1.hadoop.ds.com:8441 | grep CONNECTED
.
Created 03-14-2018 09:33 AM
It should not be a temoporary problem.because It's been a long time.
run "openssl s_client -connect master1.hadoop.ds.com:8441 | grep CONNECTED " The output is normal:
CONNECTED(00000003)
As you said in other questions,Check the "host_version" "host_version" "host" table state is "CURRENT".I really don't understand.
Created 03-21-2018 07:33 AM
can you help me,I checked all the configurations are right.