Support Questions

Find answers, ask questions, and share your expertise

Read Time out issue with NiFi cluster NiFi version nifi-1.2.0.3.0.0.0-453

avatar
Expert Contributor

Hello,

I am on the process of changing the nodes of our current nifi cluster for completely nifi dedicated nodes. I usually ran a 4 node nifi cluster and today I added 4 new nodes to that cluster. The main goal is to take out the old 4 nodes and to keep the new 4 nifi dedicated nodes in the cluster. All the nodes in the cluster are currently running on the nifi version nifi-1.2.0.3.0.0.0-453. I added the 4 new nodes successfully and gave them the proxy permissions and everything. I am able to log in to the new nodes and access the UI from them and they connected to the cluster without any issues. However whenever I try to access a specific process group from one of the 4 new nodes I keep getting the following error:

16693-sockettimeout.jpg

This is only happening whenever I try to access that specific process group from any of the new nodes. If I access it from any of the old nodes no error happens and I can access the process group just fine. I looked at the logs from the new nodes and this is what I see:

2017-07-05 17:10:56,745 WARN [Replicate Request Thread-10] o.a.n.c.c.h.r.ThreadPoolRequestReplicator 
com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out
 at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
 at com.sun.jersey.api.client.Client.handle(Client.java:652)
 at com.sun.jersey.api.client.filter.GZIPContentEncodingFilter.handle(GZIPContentEncodingFilter.java:123)
 at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
 at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
 at com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:509)
 at org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator.replicateRequest(ThreadPoolRequestReplicator.java:611)
 at org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator$NodeHttpRequest.run(ThreadPoolRequestReplicator.java:822)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketTimeoutException: Read timed out
 at java.net.SocketInputStream.socketRead0(Native Method)
 at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
 at java.net.SocketInputStream.read(SocketInputStream.java:171)
 at java.net.SocketInputStream.read(SocketInputStream.java:141)
 at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
 at sun.security.ssl.InputRecord.read(InputRecord.java:503)
 at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
 at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930)
 at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
 at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
 at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735)
 at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
 at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1569)
 at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474)
 at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
 at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:338)
 at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:253)
 at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153)
 ... 12 common frames omitted

Does anyone have any ideas of what could be causing this? Is there a configuration that I have to change to avoid this error? or a permission that I have to give these nodes? Any insights would be extremely helpful.

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Adda Fuentes

Try adjusting your connection timeout settings in your nifi.properties file....

nifi.cluster.node.connection.timeout = 30 sec

nifi.cluster.node.read.timeout = 30 sec

This will give nodes a little longer to respond to requests before being disconnected by the cluster coordinator.

Thanks,

Matt

View solution in original post

10 REPLIES 10

avatar
Expert Contributor

Full error trace:

2017-07-05 17:04:12,336 WARN [Replicate Request Thread-8] o.a.n.c.c.h.r.ThreadPoolRequestReplicator Failed to replicate request GET /nifi-api/flow/process-groups/04a08abd-d360-3539-ad01-835bc54126b0 due to {}
com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out
	at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
	at com.sun.jersey.api.client.Client.handle(Client.java:652)
	at com.sun.jersey.api.client.filter.GZIPContentEncodingFilter.handle(GZIPContentEncodingFilter.java:123)
	at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
	at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
	at com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:509)
	at org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator.replicateRequest(ThreadPoolRequestReplicator.java:611)
	at org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator$NodeHttpRequest.run(ThreadPoolRequestReplicator.java:822)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketTimeoutException: Read timed out
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.net.SocketInputStream.read(SocketInputStream.java:171)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
	at sun.security.ssl.InputRecord.read(InputRecord.java:503)
	at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
	at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930)
	at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
	at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
	at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735)
	at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1569)
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474)
	at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
	at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:338)
	at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:253)
	at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153)
	... 12 common frames omitted

avatar
Master Mentor

@Adda Fuentes

Try adjusting your connection timeout settings in your nifi.properties file....

nifi.cluster.node.connection.timeout = 30 sec

nifi.cluster.node.read.timeout = 30 sec

This will give nodes a little longer to respond to requests before being disconnected by the cluster coordinator.

Thanks,

Matt

avatar
Master Guru

In addition to what Matt suggested, make sure the 4 new nodes can reach the 4 original nodes using the hostnames you use to access the UI of the original node. If you have nifi-original-1, nifi-original-2, nifi-original-3, nifi-original-4, you would want to SSH to nifi-new-1 and make sure you can ping the 4 original hostnames.

avatar
Expert Contributor

@Matt Clarke that seemed to fix the issue, thanks a lot!

avatar
Contributor

I made above changes but still the same issue , even i have increased both to 60 sec .

avatar
Master Mentor

@srinivas p

Are we talking about the same 8 node cluster here? (Am i assuming wrong that you work with Adda?)

-

Try searching your nifi-app.logs for "HTTP requests" or ""Request Counts Per URI"

-

Depending on number of Remote Process groups used to redistribute flowfiles and the now increased size of your cluster (4 to 8 nodes), you may be envcountering too many outstanding http requests which then causes these timeouts.

https://issues.apache.org/jira/browse/NIFI-4153 <-- HDF 3.0.1+

https://issues.apache.org/jira/browse/NIFI-4598 <-- HDF 3.1.0+

-

You can fix this issue by upgrading to HDF 3.0.1 and add a new property to your nifi.properties file (new property is part of fix 4153):

nifi.cluster.node.max.concurrent.requests=400 (default is 100)

-

Other things you can try now without upgrading:

1. Make sure all Remote Process Groups (RPGs) are using the "RAW" transport protocol instead of default "HTTP" transport protocol. This will reduce the number of HTTP connections being made to transfer FlowFiles by RPGs since FlowFiles would be transferred over its own dedicated tcp socket instead.

2. Increase "nifi.cluster.node.protocol.threads=50" from default 10. This will help with the larger umber of nodes in your cluster now.

3. Increase "nifi.web.jetty.threads=400" from default 200.

4. Any processors that are invalid or stopped on your canvas should be "disabled". This will improve the responsiveness of your UI since NiFi will not validate disabled processors. NiFi is always validating any stooped processors to determine if they are stopped/valid or stopped/invalid. This will occur anytime user login and every time they navigate around the canvas/flows.

-

Thanks,

Matt

avatar
Contributor

@Matt Clarke, I made above changes but still the same issue , even i have increased both to 60 sec .

avatar
Contributor
@Matt Clarke

I did not find "HTTP requests" or ""Request Counts Per URI" . in nifi-app.logs

72671-screen-shot-2018-05-07-at-94541-pm.png

I have increased ifi.cluster.node.protocol.threads=50" and Increased "nifi.web.jetty.threads=400

My cluster is 16 nodes cluster among 3 nodes are NIFI nodes.

Thanks,

Srinivas

avatar
Contributor

but still the same issue..