Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

nifi socket timeout importing template

avatar
Expert Contributor

Hi all.

I'm trying to import a (Kylo) template inside Nifi, but I'm having some issues. The template import process goes fine, but when I try to deploy the template to a process group the nifi UI goes in error with the following message:

com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out

On the nifi-app.log I can see these messages:

2017-12-15 15:18:11,005 WARN [Replicate Request Thread-5] o.a.n.c.c.h.r.ThreadPoolRequestReplicator Failed to replicate request POST /nifi-api/process-groups/2f9f6866-13e0-1bc2-ffff-ffffccd0318d/template-instance to tst-hdfsandbox.pochdp.csi.it:9090 due to {}
com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
at com.sun.jersey.api.client.Client.handle(Client.java:652)
at com.sun.jersey.api.client.filter.GZIPContentEncodingFilter.handle(GZIPContentEncodingFilter.java:123)
at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:560)
at org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator.replicateRequest(ThreadPoolRequestReplicator.java:630)
at org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator$NodeHttpRequest.run(ThreadPoolRequestReplicator.java:832)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1569)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:253)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153)
... 12 common frames omitted
2017-12-15 15:18:11,005 WARN [Replicate Request Thread-5] o.a.n.c.c.h.r.ThreadPoolRequestReplicator
com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
at com.sun.jersey.api.client.Client.handle(Client.java:652)
at com.sun.jersey.api.client.filter.GZIPContentEncodingFilter.handle(GZIPContentEncodingFilter.java:123)
at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:560)
at org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator.replicateRequest(ThreadPoolRequestReplicator.java:630)
at org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator$NodeHttpRequest.run(ThreadPoolRequestReplicator.java:832)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1569)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:253)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153)
... 12 common frames omitted
2017-12-15 15:18:11,006 WARN [Replicate Request Thread-5] o.a.n.c.c.node.NodeClusterCoordinator All nodes failed to process URI POST /nifi-api/process-groups/2f9f6866-13e0-1bc2-ffff-ffffccd0318d/template-instance. As a result, no node will be disconnected from cluster

has anyone already faced this problem?

Thank you,

1 ACCEPTED SOLUTION

avatar
Super Mentor
@Davide Vergari

@srinivas p

-

Let me explain what is going on here so you can understand why the configuration change made helps in some cases and not others.

-

When you add a component to the canvas of a NiFi cluster the following steps are performed.

1. That request is forwarded to the current elected cluster coordinator on behalf of the user who added the component.

2. The cluster coordinator then replicates that request to all the nodes connected in the cluster. (the nifi.cluster.node.protocol.threads=10 setting dictates how many concurrent request can be made, so larger clusters will need to have this value increased)

3. Each node must then make the change and respond back to the cluster coordinator.

-

While this process is consistent for every such replicated request, not all changes are equal.

The action of selecting a bunch of components on the canvas copying and pasting them or instantiating a large template to the canvas also constitutes a single replication requests instead of many requests. These component bundles are referred to as snippets in NiFi. This action is not asynchronous. this means that each node must add every component (processors, connections, controller services, etc..) from this snippet before it responds to the cluster coordinator. Depending on size of the snippet and load on the server, this request may very likely exceed to configured timeout. This results in those nodes that timed out being disconnected from cluster.

-

Increasing the timeouts allows more time for these snippets to be instantiated and response to be received. Because there is no way to know how large these snippets are, the timeout setting that works for one user may not work for others.

-

Two things to keep in mind here:

1. How many node requests can be made concurrently (as shown above default is 10). Using a 16 node NiFi cluster as an example, 10 nodes must respond before the other 6 even get the request. so increase this value.

2. The nifi.cluster.node.connection.timeout and nifi.cluster.node.read.timeout can be set to much higher values. Even settings these timeouts to 2 - 5 minutes does not mean every request will take that long. It simply means that you will allow that much time before the cluster coordinator makes the decision to disconnect the node due to timeout.

-

There is work on the roadmap to redesign these types of replication requests in to asynchronous type requests eventually. Once that happens user will not need such high timeouts configured.

-

Thank you,

Matt

View solution in original post

4 REPLIES 4

avatar
Expert Contributor

I solved the issue myself, just increasing the nifi.cluster.node.connection.timeout and nifi.cluster.node.read.timeout to 15 seconds

avatar
Contributor
@Davide Vergari

, I made above changes but still the same issue , even i have increased both to 60 sec .

avatar
Super Mentor
@Davide Vergari

@srinivas p

-

Let me explain what is going on here so you can understand why the configuration change made helps in some cases and not others.

-

When you add a component to the canvas of a NiFi cluster the following steps are performed.

1. That request is forwarded to the current elected cluster coordinator on behalf of the user who added the component.

2. The cluster coordinator then replicates that request to all the nodes connected in the cluster. (the nifi.cluster.node.protocol.threads=10 setting dictates how many concurrent request can be made, so larger clusters will need to have this value increased)

3. Each node must then make the change and respond back to the cluster coordinator.

-

While this process is consistent for every such replicated request, not all changes are equal.

The action of selecting a bunch of components on the canvas copying and pasting them or instantiating a large template to the canvas also constitutes a single replication requests instead of many requests. These component bundles are referred to as snippets in NiFi. This action is not asynchronous. this means that each node must add every component (processors, connections, controller services, etc..) from this snippet before it responds to the cluster coordinator. Depending on size of the snippet and load on the server, this request may very likely exceed to configured timeout. This results in those nodes that timed out being disconnected from cluster.

-

Increasing the timeouts allows more time for these snippets to be instantiated and response to be received. Because there is no way to know how large these snippets are, the timeout setting that works for one user may not work for others.

-

Two things to keep in mind here:

1. How many node requests can be made concurrently (as shown above default is 10). Using a 16 node NiFi cluster as an example, 10 nodes must respond before the other 6 even get the request. so increase this value.

2. The nifi.cluster.node.connection.timeout and nifi.cluster.node.read.timeout can be set to much higher values. Even settings these timeouts to 2 - 5 minutes does not mean every request will take that long. It simply means that you will allow that much time before the cluster coordinator makes the decision to disconnect the node due to timeout.

-

There is work on the roadmap to redesign these types of replication requests in to asynchronous type requests eventually. Once that happens user will not need such high timeouts configured.

-

Thank you,

Matt

avatar
Expert Contributor

Hello Matt,

thank you for the explanation!