We have opened an ticket to hortonwork support but don't have any clues about this message, we have a lot of messages :
2018-04-12 14:18:10,277 WARN [Replicate Request Thread-42] o.a.n.c.c.h.r.ThreadPoolRequestReplicator Response time from node1-nifi:9091 was slow for each of the last 3 requests made. To see more information about timing, enable DEBUG logging for org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator
We have set some parameters :
nifi.cluster.node.connection.timeout=60 sec nifi.cluster.node.read.timeout=60 sec nifi.cluster.node.protocol.max.threads=100 nifi.cluster.node.protocol.port=9088 nifi.cluster.node.protocol.threads=80 nifi.web.jetty.threads=600
After restart the nifi, all is ok for a few minutes before the message come back again.
I don't use custom processors, and CPU use is around 5%, host has 32Go and Memory HEAD is set to minimum 8GB and max 16GB. NIFI 1.2
Any helps are welcome.
It is very possible the slow responses are the result of a very high number of http requests coming in to your NiFi nodes.
The main contributor to high number of requests are Remote Process Groups (RPG). It is very common for users to design dataflows that use many RPGs throughout their canvas to redistribute FlowFiles across their cluster. Each RPG is pinging the target http Nifi instance for the current Site-To-Site (S2S) details. Assume you have a 5 node cluster with 20 RPGs all pointing back to same cluster as an example. That means that every RPG on every node is requesting S2S details every 30 seconds. That alone is 100 HTTP request every 30 seconds. There is a improvement implemented through https://issues.apache.org/jira/browse/NIFI-4598 (fixed in NiFi 1.5) to improve how RPGs work in this scenario.
Additionally NiFi 1.2 is hardcoded to allow only 100 concurrent http requests which can lead to temporary unavailable of the http endpoint. This was resolved in an improvement covered via https://issues.apache.org/jira/browse/NIFI-4143 (fixed in NiFi 1.4) which allows users to increase the number of allowed concurrent http requests.
Another suggestion to improve http endpoint performance would be to make sure your RPGs are configured to use the "RAW" transport instead of "HTTP". While S2S details are still retrieved over http, the transfer of FlowFiles would be sent over a dedicated socket port instead of over http port.
Upgrading to Apache NiFi 1.5 will include both of the fixes above.
HDF 3.1 is based off Apache NIFi 1.5.
If you found this answer addressed your question, please take a moment to login and click "accept".