Created 02-06-2017 06:33 PM
Problem: Runtime issue of a "fresh" Ambari server for SLES 11.4 installation (see http://docs.hortonworks.com/HDPDocuments/Ambari-2.4.2.0/bk_ambari-installation/content/ch_Installing.... There is no communication between Ambari Server and Ambari Agent via handshake or registration/heartbeat ports. A connection to the port can be established on client-side but there is no server-side response from the application listening on the port. curl shows "TLSv1.0, TLS handshake, Client hello (1)" only (answer "TLSv1.0, TLS handshake, Server hello (1)" is missing): ls5567:~ # curl -v https://localhost:8440; * About to connect() to localhost port 8440 (#0) * Trying 127.0.0.1... connected * Connected to localhost (127.0.0.1) port 8440 (#0) * successfully set certificate verify locations: * CAfile: none CApath: /etc/ssl/certs/ * TLSv1.0, TLS handshake, Client hello (1): * SSL connection timeout * Closing connection #0 curl: (28) SSL connection timeout openssl hangs forever: ls5567:~ # openssl s_client -connect localhost:8440 CONNECTED(00000003) ^C netstat reports this: ls5567:~ # netstat -nopa|grep :844 tcp 0 0 127.0.0.1:49634 127.0.0.1:8440 ESTABLISHED 62178/openssl off (0.00/0/0) tcp 0 0 127.0.0.1:44860 127.0.0.1:8440 ESTABLISHED 36869/python off (0.00/0/0) tcp 0 0 :::8440 :::* LISTEN 61747/java off (0.00/0/0) tcp 0 0 :::8441 :::* LISTEN 61747/java off (0.00/0/0) tcp 0 0 :::8443 :::* LISTEN 61747/java off (0.00/0/0) tcp 137 0 127.0.0.1:8440 127.0.0.1:48270 CLOSE_WAIT 61747/java off (0.00/0/0) tcp 137 0 127.0.0.1:8440 127.0.0.1:55292 CLOSE_WAIT 61747/java off (0.00/0/0) tcp 128 0 127.0.0.1:8440 127.0.0.1:49634 ESTABLISHED 61747/java off (0.00/0/0) tcp 128 0 127.0.0.1:8440 127.0.0.1:44860 ESTABLISHED 61747/java off (0.00/0/0) Please share ideas to find the root cause of this issue.
Created 02-06-2017 07:42 PM
what is the ambari-server system configuration?
can you increase the "agent.threadpool.size.max" to say 120 ( by default it will have 25) and restart the ambari-server?
we do see some issues in the higher end machines where there are more number of CPUs.
let us know the result.
Created 02-06-2017 07:42 PM
what is the ambari-server system configuration?
can you increase the "agent.threadpool.size.max" to say 120 ( by default it will have 25) and restart the ambari-server?
we do see some issues in the higher end machines where there are more number of CPUs.
let us know the result.
Created 02-07-2017 06:33 AM
Is it single node cluster? I see localhost so wondering if you registered ambari agent correctly.
You can try below command and see if it helps.
ambari-agent stop ambari-agent reset <ambari-server-hostname> ambari-agent start
If it's a production cluster. Please be careful. This command will reset ambari agent configuration including SSL certs(mostly self signed unless you have configured non default options) used for communication between ambari agent and server.
No HDP services will be affected by running this command.
Created 02-07-2017 07:20 AM
many thanks: increasing the value of parameter agent.threadpool.size.max did it. do you have any "rule-of-thumb" or "best-practice" formula at hand to what minimum size this value should be set ?
Created 02-07-2017 03:22 PM
There is no rule of thumb/formula here but jetty seems to have some problem with high CPU machines. ideally 32 should be good enough for 5 to 6 nodes of cluster, but if you have more nodes then you may have to increase this number.
Please select the correct answer.
Created 02-07-2017 01:10 PM
Additional question (because of these findings in the ambari-server.log file):
ls5567:~ # grep ambari-client-thread /var/log/ambari-server/ambari-server.log
07 Feb 2017 12:10:47,371 WARN [main] AmbariServer:693 - The configured Jetty ambari-client-thread thread pool value of 25 is not sufficient on a host with 80 processors. Increasing the value to 60.
07 Feb 2017 12:10:47,372 INFO [main] AmbariServer:701 - Jetty is configuring ambari-client-thread with 40 reserved acceptors/selectors and a total pool size of 60 for 80 processors.
What is the relationship between client.threadpool.size.max and agent.threadpool.size.max values ?
- client.threadpool.size.max <= or < agent.threadpool.size.max ?
- client.threadpool.size.max >= or > agent.threadpool.size.max ?
- client.threadpool.size.max = agent.threadpool.size.max ?
Created 02-07-2017 03:26 PM
in Ambari listens on 2 different ports,
one port is to server REST calls - example Ambari UI makes REST call to server
second port is to server Agent request - all the agents communicates with server on this port.
client.threadpool.size.max - this property is being used by port listening for REST calls
agent.threadpool.size.max- this propetry is being used by port listening for Agents
Created 02-07-2017 03:28 PM
Ambari internally uses "Jetty" server where we have "acceptors" and "selector" threads just like any other servers. From Jetty9 the formula for calculating the "acceptors" and "selector" threads are much more enhanced and changed.
Some informations regarding "acceptors" and "selector":
Acceptor threads accepts new connections from client. They run a loop on a blocking accept() call to accept connections. With a box with 80 cores, which is not typical, you are recommended to manually configure the number of acceptors and selectors, since the guesses we make are probably off. The right number of acceptor threads is determined by the server load and the traffic shape, in particular by connection open/close rate. We can higher this rate, the more acceptors you want.
Selector threads manage events on connected sockets. Every time a socket connects, the acceptor threads accepts the connection and assign the socket to a selector, chosen in a round robin fashion. The selector is responsible to detect activity on that socket (I/O events such as read availability and write availability) and signal that event to other code in Jetty that will handle the event. One thread runs one selector. This is basic async I/O using selectors. If you have a very busy server (say 100k connected clients at any time or more), If you want to "spread" those clients among many selectors, so that each selector will handle a portion of the connected clients and be faster in responding to client activity.
From Ambari 2.5.0 onwards the logic is much enhanced as part of : https://issues.apache.org/jira/browse/AMBARI-18827 to avoid blocking of ambari agent restart for hosts where we have higher number of cores.
Now from ambari 2.5 the formula will be following for configuring the threadpool:
configureJettyThreadPool(serverForAgent, acceptors * 2, AGENT_THREAD_POOL_NAME, configs.getAgentThreadPoolSize());
The AGENT_THREADPOOL_SIZE is controlled by the "agent.threadpool.size.max" (default value is 25) property [1] "agent.threadpool.size.max" sets max number of threads used to process heartbeats from ambari agents and view.extraction.threadpool.size.max - for Views UI.
The CLIENT_THREADPOOL_SIZE is controlled by the "client.threadpool.size.max" property [2] used for REST API calls:
We might use:
totalCPUs=Number of CPU cores on host machine client.threadpool.size.max=totalCPUs+1 agent.threadpool.size.max=totalCPUs+1
.
Created 02-07-2017 06:48 PM
@apappu
many, many thanks again for your effort (your first answer helped me a lot to solve the issue on my side).
@Jay SenSharma
also many thanks for your detailed explanation and providing URLs to the latest Java sources.
PS (to Jay SenSharma): I saw that the number of your "Reputation" credits are higher than those of apappu. Therefore I marked his first answer with "Accept". I hope you can live with it.