Created 12-15-2017 02:24 PM
Hi all.
I'm trying to import a (Kylo) template inside Nifi, but I'm having some issues. The template import process goes fine, but when I try to deploy the template to a process group the nifi UI goes in error with the following message:
com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out
On the nifi-app.log I can see these messages:
2017-12-15 15:18:11,005 WARN [Replicate Request Thread-5] o.a.n.c.c.h.r.ThreadPoolRequestReplicator Failed to replicate request POST /nifi-api/process-groups/2f9f6866-13e0-1bc2-ffff-ffffccd0318d/template-instance to tst-hdfsandbox.pochdp.csi.it:9090 due to {}
com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
at com.sun.jersey.api.client.Client.handle(Client.java:652)
at com.sun.jersey.api.client.filter.GZIPContentEncodingFilter.handle(GZIPContentEncodingFilter.java:123)
at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:560)
at org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator.replicateRequest(ThreadPoolRequestReplicator.java:630)
at org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator$NodeHttpRequest.run(ThreadPoolRequestReplicator.java:832)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1569)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:253)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153)
... 12 common frames omitted
2017-12-15 15:18:11,005 WARN [Replicate Request Thread-5] o.a.n.c.c.h.r.ThreadPoolRequestReplicator
com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
at com.sun.jersey.api.client.Client.handle(Client.java:652)
at com.sun.jersey.api.client.filter.GZIPContentEncodingFilter.handle(GZIPContentEncodingFilter.java:123)
at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:560)
at org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator.replicateRequest(ThreadPoolRequestReplicator.java:630)
at org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator$NodeHttpRequest.run(ThreadPoolRequestReplicator.java:832)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1569)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:253)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153)
... 12 common frames omitted
2017-12-15 15:18:11,006 WARN [Replicate Request Thread-5] o.a.n.c.c.node.NodeClusterCoordinator All nodes failed to process URI POST /nifi-api/process-groups/2f9f6866-13e0-1bc2-ffff-ffffccd0318d/template-instance. As a result, no node will be disconnected from cluster
has anyone already faced this problem?
Thank you,
Created 05-07-2018 05:46 PM
@srinivas p
-
Let me explain what is going on here so you can understand why the configuration change made helps in some cases and not others.
-
When you add a component to the canvas of a NiFi cluster the following steps are performed.
1. That request is forwarded to the current elected cluster coordinator on behalf of the user who added the component.
2. The cluster coordinator then replicates that request to all the nodes connected in the cluster. (the nifi.cluster.node.protocol.threads=10 setting dictates how many concurrent request can be made, so larger clusters will need to have this value increased)
3. Each node must then make the change and respond back to the cluster coordinator.
-
While this process is consistent for every such replicated request, not all changes are equal.
The action of selecting a bunch of components on the canvas copying and pasting them or instantiating a large template to the canvas also constitutes a single replication requests instead of many requests. These component bundles are referred to as snippets in NiFi. This action is not asynchronous. this means that each node must add every component (processors, connections, controller services, etc..) from this snippet before it responds to the cluster coordinator. Depending on size of the snippet and load on the server, this request may very likely exceed to configured timeout. This results in those nodes that timed out being disconnected from cluster.
-
Increasing the timeouts allows more time for these snippets to be instantiated and response to be received. Because there is no way to know how large these snippets are, the timeout setting that works for one user may not work for others.
-
Two things to keep in mind here:
1. How many node requests can be made concurrently (as shown above default is 10). Using a 16 node NiFi cluster as an example, 10 nodes must respond before the other 6 even get the request. so increase this value.
2. The nifi.cluster.node.connection.timeout and nifi.cluster.node.read.timeout can be set to much higher values. Even settings these timeouts to 2 - 5 minutes does not mean every request will take that long. It simply means that you will allow that much time before the cluster coordinator makes the decision to disconnect the node due to timeout.
-
There is work on the roadmap to redesign these types of replication requests in to asynchronous type requests eventually. Once that happens user will not need such high timeouts configured.
-
Thank you,
Matt
Created 12-18-2017 11:56 AM
I solved the issue myself, just increasing the nifi.cluster.node.connection.timeout and nifi.cluster.node.read.timeout to 15 seconds
Created 05-07-2018 02:58 PM
, I made above changes but still the same issue , even i have increased both to 60 sec .
Created 05-07-2018 05:46 PM
@srinivas p
-
Let me explain what is going on here so you can understand why the configuration change made helps in some cases and not others.
-
When you add a component to the canvas of a NiFi cluster the following steps are performed.
1. That request is forwarded to the current elected cluster coordinator on behalf of the user who added the component.
2. The cluster coordinator then replicates that request to all the nodes connected in the cluster. (the nifi.cluster.node.protocol.threads=10 setting dictates how many concurrent request can be made, so larger clusters will need to have this value increased)
3. Each node must then make the change and respond back to the cluster coordinator.
-
While this process is consistent for every such replicated request, not all changes are equal.
The action of selecting a bunch of components on the canvas copying and pasting them or instantiating a large template to the canvas also constitutes a single replication requests instead of many requests. These component bundles are referred to as snippets in NiFi. This action is not asynchronous. this means that each node must add every component (processors, connections, controller services, etc..) from this snippet before it responds to the cluster coordinator. Depending on size of the snippet and load on the server, this request may very likely exceed to configured timeout. This results in those nodes that timed out being disconnected from cluster.
-
Increasing the timeouts allows more time for these snippets to be instantiated and response to be received. Because there is no way to know how large these snippets are, the timeout setting that works for one user may not work for others.
-
Two things to keep in mind here:
1. How many node requests can be made concurrently (as shown above default is 10). Using a 16 node NiFi cluster as an example, 10 nodes must respond before the other 6 even get the request. so increase this value.
2. The nifi.cluster.node.connection.timeout and nifi.cluster.node.read.timeout can be set to much higher values. Even settings these timeouts to 2 - 5 minutes does not mean every request will take that long. It simply means that you will allow that much time before the cluster coordinator makes the decision to disconnect the node due to timeout.
-
There is work on the roadmap to redesign these types of replication requests in to asynchronous type requests eventually. Once that happens user will not need such high timeouts configured.
-
Thank you,
Matt
Created 05-08-2018 12:38 PM
Hello Matt,
thank you for the explanation!