Support Questions

Find answers, ask questions, and share your expertise

Communication between Ambari Server and Ambari Agent seems to be blocked

avatar

Problem: Runtime issue of a "fresh" Ambari server for SLES 11.4 installation (see http://docs.hortonworks.com/HDPDocuments/Ambari-2.4.2.0/bk_ambari-installation/content/ch_Installing.... There is no communication between Ambari Server and Ambari Agent via handshake or registration/heartbeat ports. A connection to the port can be established on client-side but there is no server-side response from the application listening on the port. curl shows "TLSv1.0, TLS handshake, Client hello (1)" only (answer "TLSv1.0, TLS handshake, Server hello (1)" is missing): ls5567:~ # curl -v https://localhost:8440; * About to connect() to localhost port 8440 (#0) * Trying 127.0.0.1... connected * Connected to localhost (127.0.0.1) port 8440 (#0) * successfully set certificate verify locations: * CAfile: none CApath: /etc/ssl/certs/ * TLSv1.0, TLS handshake, Client hello (1): * SSL connection timeout * Closing connection #0 curl: (28) SSL connection timeout openssl hangs forever: ls5567:~ # openssl s_client -connect localhost:8440 CONNECTED(00000003) ^C netstat reports this: ls5567:~ # netstat -nopa|grep :844 tcp 0 0 127.0.0.1:49634 127.0.0.1:8440 ESTABLISHED 62178/openssl off (0.00/0/0) tcp 0 0 127.0.0.1:44860 127.0.0.1:8440 ESTABLISHED 36869/python off (0.00/0/0) tcp 0 0 :::8440 :::* LISTEN 61747/java off (0.00/0/0) tcp 0 0 :::8441 :::* LISTEN 61747/java off (0.00/0/0) tcp 0 0 :::8443 :::* LISTEN 61747/java off (0.00/0/0) tcp 137 0 127.0.0.1:8440 127.0.0.1:48270 CLOSE_WAIT 61747/java off (0.00/0/0) tcp 137 0 127.0.0.1:8440 127.0.0.1:55292 CLOSE_WAIT 61747/java off (0.00/0/0) tcp 128 0 127.0.0.1:8440 127.0.0.1:49634 ESTABLISHED 61747/java off (0.00/0/0) tcp 128 0 127.0.0.1:8440 127.0.0.1:44860 ESTABLISHED 61747/java off (0.00/0/0) Please share ideas to find the root cause of this issue.

1 ACCEPTED SOLUTION

avatar

@Kargol Meister

what is the ambari-server system configuration?

can you increase the "agent.threadpool.size.max" to say 120 ( by default it will have 25) and restart the ambari-server?

we do see some issues in the higher end machines where there are more number of CPUs.

let us know the result.

View solution in original post

8 REPLIES 8

avatar

@Kargol Meister

what is the ambari-server system configuration?

can you increase the "agent.threadpool.size.max" to say 120 ( by default it will have 25) and restart the ambari-server?

we do see some issues in the higher end machines where there are more number of CPUs.

let us know the result.

avatar
Master Guru
@Kargol Meister

Is it single node cluster? I see localhost so wondering if you registered ambari agent correctly.

You can try below command and see if it helps.

ambari-agent stop
ambari-agent reset <ambari-server-hostname>
ambari-agent start

If it's a production cluster. Please be careful. This command will reset ambari agent configuration including SSL certs(mostly self signed unless you have configured non default options) used for communication between ambari agent and server.

No HDP services will be affected by running this command.

avatar
@apappu

many thanks: increasing the value of parameter agent.threadpool.size.max did it. do you have any "rule-of-thumb" or "best-practice" formula at hand to what minimum size this value should be set ?

avatar

@Kargol Meister

There is no rule of thumb/formula here but jetty seems to have some problem with high CPU machines. ideally 32 should be good enough for 5 to 6 nodes of cluster, but if you have more nodes then you may have to increase this number.

Please select the correct answer.

avatar

@apappu

Additional question (because of these findings in the ambari-server.log file):

ls5567:~ # grep ambari-client-thread /var/log/ambari-server/ambari-server.log

07 Feb 2017 12:10:47,371 WARN [main] AmbariServer:693 - The configured Jetty ambari-client-thread thread pool value of 25 is not sufficient on a host with 80 processors. Increasing the value to 60.

07 Feb 2017 12:10:47,372 INFO [main] AmbariServer:701 - Jetty is configuring ambari-client-thread with 40 reserved acceptors/selectors and a total pool size of 60 for 80 processors.

What is the relationship between client.threadpool.size.max and agent.threadpool.size.max values ?

- client.threadpool.size.max <= or < agent.threadpool.size.max ?

- client.threadpool.size.max >= or > agent.threadpool.size.max ?

- client.threadpool.size.max = agent.threadpool.size.max ?

avatar

@Kargol Meister

in Ambari listens on 2 different ports,

one port is to server REST calls - example Ambari UI makes REST call to server

second port is to server Agent request - all the agents communicates with server on this port.

client.threadpool.size.max - this property is being used by port listening for REST calls

agent.threadpool.size.max- this propetry is being used by port listening for Agents

avatar
Master Mentor

@Kargol Meister

Ambari internally uses "Jetty" server where we have "acceptors" and "selector" threads just like any other servers. From Jetty9 the formula for calculating the "acceptors" and "selector" threads are much more enhanced and changed.

Some informations regarding "acceptors" and "selector":

Acceptor threads accepts new connections from client. They run a loop on a blocking accept() call to accept connections. With a box with 80 cores, which is not typical, you are recommended to manually configure the number of acceptors and selectors, since the guesses we make are probably off. The right number of acceptor threads is determined by the server load and the traffic shape, in particular by connection open/close rate. We can higher this rate, the more acceptors you want.

Selector threads manage events on connected sockets. Every time a socket connects, the acceptor threads accepts the connection and assign the socket to a selector, chosen in a round robin fashion. The selector is responsible to detect activity on that socket (I/O events such as read availability and write availability) and signal that event to other code in Jetty that will handle the event. One thread runs one selector. This is basic async I/O using selectors. If you have a very busy server (say 100k connected clients at any time or more), If you want to "spread" those clients among many selectors, so that each selector will handle a portion of the connected clients and be faster in responding to client activity.

From Ambari 2.5.0 onwards the logic is much enhanced as part of : https://issues.apache.org/jira/browse/AMBARI-18827 to avoid blocking of ambari agent restart for hosts where we have higher number of cores.

Now from ambari 2.5 the formula will be following for configuring the threadpool:

configureJettyThreadPool(serverForAgent, acceptors * 2, AGENT_THREAD_POOL_NAME, configs.getAgentThreadPoolSize());

The AGENT_THREADPOOL_SIZE is controlled by the "agent.threadpool.size.max" (default value is 25) property [1] "agent.threadpool.size.max" sets max number of threads used to process heartbeats from ambari agents and view.extraction.threadpool.size.max - for Views UI.

The CLIENT_THREADPOOL_SIZE is controlled by the "client.threadpool.size.max" property [2] used for REST API calls:

We might use:

totalCPUs=Number of CPU cores on host machine
client.threadpool.size.max=totalCPUs+1
agent.threadpool.size.max=totalCPUs+1 

.

[1] https://github.com/apache/ambari/blob/trunk/ambari-server/src/main/java/org/apache/ambari/server/con...

[2] https://github.com/apache/ambari/blob/trunk/ambari-server/src/main/java/org/apache/ambari/server/con...

avatar

@apappu

many, many thanks again for your effort (your first answer helped me a lot to solve the issue on my side).

@Jay SenSharma

also many thanks for your detailed explanation and providing URLs to the latest Java sources.

PS (to Jay SenSharma): I saw that the number of your "Reputation" credits are higher than those of apappu. Therefore I marked his first answer with "Accept". I hope you can live with it.