ambari server can't stop server and start server ,when the server running some days.
i can't use the ambari web UI to stop and start the server in other machine, when the machine is reboot , the ambari server can manager ther server in reboot server . how can slove it? thanks.
Can you please let us know what do you mean by the word "server" here : "i can't use the ambari web UI to stop and start the server in other machine".
Do you mean servers like HiveServer, HistoryTimelineServer/Zookeeper Server ...etc
Also can you please describe, what do you mean by "the ambari server can manager ther server in reboot server" ?
Do you see any Error/Warning in ambari-server.log?
Do you see any issue in the ambari-agent logs?
When the ambari server is not responding that time did you check the ambari server Memory Usage? You can use the following command:
# $JAVA_HOME/bin/jmap -heap $AMBARI_PID
You can also collect the ambari server thread dump to find out why the ambari server is stuck or not responding after few days. Please refer to the following link to know how to collect the Thread dump along with the CPU statistics. Please share the thread dump and the cpu data file.
What is the ambari cluster size (Means the number of Host present in the cluster) and the current Heap Setting for ambari. Please share the output of the following command so that we can see the memory settings:
# ps -ef | grep -i AmbariServer
Sharing the ambari-server log can be useful.
@jay sensharma yes,the server is "zookeeper ,hdfs,yarn and etc"
there is no error and system log don't output somelog .
"the ambari server can manager ther server in reboot server" ?
the ambair (web ui) can start or stop "zookeeper ,hdfs etc.." ,when the machine reboot (install "zookeeper, hdfs")
In ambari 2.5 we have the "Service Auto Start" feature available in ambari UI as following, Have you set the autostart feature for your services? https://docs.hortonworks.com/HDPDocuments/Ambari-126.96.36.199/bk_ambari-operations/content/ch07s04.html
this is liken the ambari-server not output the stop or start message to agent ,the heartbeat is lose...
i can't use ambari server (web ui) to manager the "zookeeper ,hdfs,yarn,etc"(stop or start). when the machine is reboot,
i can use ambari server (web ui) to start "zookeeper ,hdfs,yarn etc."
you mentioned: "this is liken the ambari-server not output the stop or start message to agent ,the heartbeat is lose..."
>>> If the ambari agent heart beat is lost then in that case it will not be able to get any instruction from AmbariServer. Because ambari-server can not communicate to the Agent, whose heartbeat is lost. So please check the "hosts" page of your ambari server to see which all hosts have the heartbeat lost message. Try to restart the agent on those hosts to see if ambari is able to receive the agent heartbeat messages or not?
If the Heartbeat issue persist then we might need to debug this issue to find out why the agents lost the heartbeat. Usually it happens due to N/W issue. But looking at the ambari-server.log and ambari-agent.log can be useful to findout the heartbeat lost reason.
jmap -heap 5414 Attaching to process ID 5414, please wait... Debugger attached successfully. Server compiler detected. JVM version is 25.121-b13 using parallel threads in the new generation. using thread-local object allocation. Concurrent Mark-Sweep GC Heap Configuration: MinHeapFreeRatio = 40 MaxHeapFreeRatio = 70 MaxHeapSize = 4294967296 (4096.0MB) NewSize = 1073741824 (1024.0MB) MaxNewSize = 1073741824 (1024.0MB) OldSize = 3221225472 (3072.0MB) NewRatio = 3 SurvivorRatio = 8 MetaspaceSize = 21807104 (20.796875MB) CompressedClassSpaceSize = 1073741824 (1024.0MB) MaxMetaspaceSize = 17592186044415 MB G1HeapRegionSize = 0 (0.0MB) Heap Usage: New Generation (Eden + 1 Survivor Space): capacity = 966393856 (921.625MB) used = 234290320 (223.43666076660156MB) free = 732103536 (698.1883392333984MB) 24.24377168225705% used Eden Space: capacity = 859045888 (819.25MB) used = 225215952 (214.7826690673828MB) free = 633829936 (604.4673309326172MB) 26.216987374718684% used From Space: capacity = 107347968 (102.375MB) used = 9074368 (8.65399169921875MB) free = 98273600 (93.72100830078125MB) 8.45322754502442% used To Space: capacity = 107347968 (102.375MB) used = 0 (0.0MB) free = 107347968 (102.375MB) 0.0% used concurrent mark-sweep generation: capacity = 3221225472 (3072.0MB) used = 365827768 (348.8805465698242MB) free = 2855397704 (2723.119453430176MB) 11.3567886253198% used 67747 interned Strings occupying 6126184 bytes.
ps -ef | grep -i AmbariServer
root 3788 3576 0 11:54 pts/1 00:00:00 grep --color=auto -i AmbariServer
root 5414 1 8 Jul11 ? 12:10:12 /usr/java/jdk1.8.0_121/bin/java -server -XX:NewRatio=3 -XX:+UseConcMarkSweepGC -XX:-UseGCOverheadLimit -XX:CMSInitiatingOccupancyFraction=60 -XX:+CMSClassUnloadingEnabled -Dsun.zip.disableMemoryMapping=true -Xms4096m -Xmx4096m -XX:MaxPermSize=512m -Djava.security.auth.login.config=/etc/ambari-server/conf/krb5JAASLogin.conf -Djava.security.krb5.conf=/etc/krb5.conf -Djavax.security.auth.useSubjectCredsOnly=false -cp /etc/ambari-server/conf:/usr/lib/ambari-server/*:/usr/share/java/postgresql-jdbc.jar org.apache.ambari.server.controller.AmbariServer
the web ui is see the heartbeat is not lose.