Created on 07-05-2017 09:42 PM - edited 08-17-2019 05:38 PM
Hello,
I am on the process of changing the nodes of our current nifi cluster for completely nifi dedicated nodes. I usually ran a 4 node nifi cluster and today I added 4 new nodes to that cluster. The main goal is to take out the old 4 nodes and to keep the new 4 nifi dedicated nodes in the cluster. All the nodes in the cluster are currently running on the nifi version nifi-1.2.0.3.0.0.0-453. I added the 4 new nodes successfully and gave them the proxy permissions and everything. I am able to log in to the new nodes and access the UI from them and they connected to the cluster without any issues. However whenever I try to access a specific process group from one of the 4 new nodes I keep getting the following error:
This is only happening whenever I try to access that specific process group from any of the new nodes. If I access it from any of the old nodes no error happens and I can access the process group just fine. I looked at the logs from the new nodes and this is what I see:
2017-07-05 17:10:56,745 WARN [Replicate Request Thread-10] o.a.n.c.c.h.r.ThreadPoolRequestReplicator com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155) at com.sun.jersey.api.client.Client.handle(Client.java:652) at com.sun.jersey.api.client.filter.GZIPContentEncodingFilter.handle(GZIPContentEncodingFilter.java:123) at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682) at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:509) at org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator.replicateRequest(ThreadPoolRequestReplicator.java:611) at org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator$NodeHttpRequest.run(ThreadPoolRequestReplicator.java:822) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:171) at java.net.SocketInputStream.read(SocketInputStream.java:141) at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) at sun.security.ssl.InputRecord.read(InputRecord.java:503) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973) at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930) at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1569) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474) at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:338) at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:253) at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153) ... 12 common frames omitted
Does anyone have any ideas of what could be causing this? Is there a configuration that I have to change to avoid this error? or a permission that I have to give these nodes? Any insights would be extremely helpful.
Created 07-05-2017 10:31 PM
Try adjusting your connection timeout settings in your nifi.properties file....
nifi.cluster.node.connection.timeout = 30 sec
nifi.cluster.node.read.timeout = 30 sec
This will give nodes a little longer to respond to requests before being disconnected by the cluster coordinator.
Thanks,
Matt
Created 07-05-2017 10:22 PM
Full error trace:
2017-07-05 17:04:12,336 WARN [Replicate Request Thread-8] o.a.n.c.c.h.r.ThreadPoolRequestReplicator Failed to replicate request GET /nifi-api/flow/process-groups/04a08abd-d360-3539-ad01-835bc54126b0 due to {} com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155) at com.sun.jersey.api.client.Client.handle(Client.java:652) at com.sun.jersey.api.client.filter.GZIPContentEncodingFilter.handle(GZIPContentEncodingFilter.java:123) at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682) at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:509) at org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator.replicateRequest(ThreadPoolRequestReplicator.java:611) at org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator$NodeHttpRequest.run(ThreadPoolRequestReplicator.java:822) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:171) at java.net.SocketInputStream.read(SocketInputStream.java:141) at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) at sun.security.ssl.InputRecord.read(InputRecord.java:503) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973) at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930) at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1569) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474) at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:338) at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:253) at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153) ... 12 common frames omitted
Created 07-05-2017 10:31 PM
Try adjusting your connection timeout settings in your nifi.properties file....
nifi.cluster.node.connection.timeout = 30 sec
nifi.cluster.node.read.timeout = 30 sec
This will give nodes a little longer to respond to requests before being disconnected by the cluster coordinator.
Thanks,
Matt
Created 07-06-2017 12:44 AM
In addition to what Matt suggested, make sure the 4 new nodes can reach the 4 original nodes using the hostnames you use to access the UI of the original node. If you have nifi-original-1, nifi-original-2, nifi-original-3, nifi-original-4, you would want to SSH to nifi-new-1 and make sure you can ping the 4 original hostnames.
Created 07-06-2017 08:12 PM
@Matt Clarke that seemed to fix the issue, thanks a lot!
Created 05-07-2018 02:55 PM
I made above changes but still the same issue , even i have increased both to 60 sec .
Created 05-07-2018 03:49 PM
Are we talking about the same 8 node cluster here? (Am i assuming wrong that you work with Adda?)
-
Try searching your nifi-app.logs for "HTTP requests" or ""Request Counts Per URI"
-
Depending on number of Remote Process groups used to redistribute flowfiles and the now increased size of your cluster (4 to 8 nodes), you may be envcountering too many outstanding http requests which then causes these timeouts.
https://issues.apache.org/jira/browse/NIFI-4153 <-- HDF 3.0.1+
https://issues.apache.org/jira/browse/NIFI-4598 <-- HDF 3.1.0+
-
You can fix this issue by upgrading to HDF 3.0.1 and add a new property to your nifi.properties file (new property is part of fix 4153):
nifi.cluster.node.max.concurrent.requests=400 (default is 100)
-
Other things you can try now without upgrading:
1. Make sure all Remote Process Groups (RPGs) are using the "RAW" transport protocol instead of default "HTTP" transport protocol. This will reduce the number of HTTP connections being made to transfer FlowFiles by RPGs since FlowFiles would be transferred over its own dedicated tcp socket instead.
2. Increase "nifi.cluster.node.protocol.threads=50" from default 10. This will help with the larger umber of nodes in your cluster now.
3. Increase "nifi.web.jetty.threads=400" from default 200.
4. Any processors that are invalid or stopped on your canvas should be "disabled". This will improve the responsiveness of your UI since NiFi will not validate disabled processors. NiFi is always validating any stooped processors to determine if they are stopped/valid or stopped/invalid. This will occur anytime user login and every time they navigate around the canvas/flows.
-
Thanks,
Matt
Created 05-07-2018 02:56 PM
@Matt Clarke, I made above changes but still the same issue , even i have increased both to 60 sec .
Created on 05-07-2018 04:25 PM - edited 08-17-2019 05:38 PM
I did not find "HTTP requests" or ""Request Counts Per URI" . in nifi-app.logs
I have increased ifi.cluster.node.protocol.threads=50" and Increased "nifi.web.jetty.threads=400
My cluster is 16 nodes cluster among 3 nodes are NIFI nodes.
Thanks,
Srinivas
Created 05-07-2018 04:26 PM
but still the same issue..