Created 04-09-2018 03:41 PM
Hi,
I need to perform site-to-site transmission within a NiFi dataflow with large files (over 2GB).
The transmission does not work, I have logs like "Awaiting transferDataLatch has been timeout"
I have tried to split the file using SegmentContent processor from source side (fragments are well sent to target cluster), and then I use MergeContent processor from target side to build the original file, but merge never starts because all fragments are not processed by the same node.
Does someone have a solution to send large files via site-to-site, or an idea to make the segment/merge workaround working?
Created 04-10-2018 08:23 AM
Thank you very much for reporting the issue. It was a bug with HTTP S2S transport protocol. It can not send data more than 2GB at once. I filed Apache NiFi JIRA and a patch for that. https://issues.apache.org/jira/browse/NIFI-5065
As a work-around, please use RAW S2S transport protocol instead, it can send large files without issue.
Created 04-10-2018 03:10 AM
Thanks for reporting this.
NiFi Site-to-Site client implements different kind of timeout and expiration settings, such as cache expiration, idle connection expiration, penalization period, batch duration, and timeout. The error you shared can occur if a S2S client waited more than 'idle connection expiration'.
The problem is, 'idle connection expiration' is not configurable by NiFi user at the moment. So, if data transferring takes more than the default 30 seconds, it will fail with the reported message. Even if longer 'Communication Timeout' is set at the Remote Process Group configuration.
From the error message you shared, I assume you are using HTTP transport protocol for S2S. I wonder if using RAW can be a work around. But by looking at the NiFi code, it may not be the case though.. because RAW uses the 'idle connection expiration' to shutdown existing sockets, too.
Split/Merge pattern will not work as you found S2S clients distribute FlowFiles among nodes in the target cluster.
I think a possible work around is using other ListenXXXX processors (e.g. ListenHTTP or ListenTCP) at the target NiFi cluster. Then send data using corresponding processors such as PostHttp or PutTCP ... etc. This way, you can control how to distribute the segmented FlowFiles to target nodes. You need to do manually pick a target hostname for load balancing. It can be done with NiFi Expression Language and certain set of processors. Please refer this template:
https://gist.github.com/ijokarumawak/077d7fdca57b9c8ff386f28c5198efd1
I will raise Apache NiFi JIRA so that 'idle connection expiration' can be set based on the 'Communication Timeout' value. In the meantime, I hope the above workaround works for you.
Created 04-10-2018 07:25 AM
UPDATES
Excuse me, the previous diagnose was wrong.
I was trying to reproduce the issue by tweaking timeout settings, however, it turned out the issue is not caused by the timeout setting, instead, there's some issue around how HTTP S2S transport transfers data. I got following exception when I tried to send a 8GB file with HTTP S2S:
2018-04-10 16:05:45,006 ERROR [I/O dispatcher 25] o.a.n.r.util.SiteToSiteRestApiClient Failed to send data to http://HW13076.local:8080/nifi-api/data-transfer/input-ports/ad9a3887-0162-1000-e312-dee642179 c9c/transactions/608f1ce4-56da-4899-9348-d2864e364d40/flow-files due to java.lang.RuntimeException: Sending data to http://HW13076.local:8080/nifi-api/data-transfer/input-ports/ad9a3887-0162-1000-e312-dee 642179c9c/transactions/608f1ce4-56da-4899-9348-d2864e364d40/flow-files has reached to its end, but produced : read : wrote byte sizes (659704502 : 659704502 : 9249639094) were not equal. Something went wr ong. java.lang.RuntimeException: Sending data to http://HW13076.local:8080/nifi-api/data-transfer/input-ports/ad9a3887-0162-1000-e312-dee642179c9c/tr... h as reached to its end, but produced : read : wrote byte sizes (659704502 : 659704502 : 9249639094) were not equal. Something went wrong. at org.apache.nifi.remote.util.SiteToSiteRestApiClient$4.produceContent(SiteToSiteRestApiClient.java:848) at org.apache.http.impl.nio.client.MainClientExec.produceContent(MainClientExec.java:262) at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.produceContent(DefaultClientExchangeHandlerImpl.java:140) at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.outputReady(HttpAsyncRequestExecutor.java:241) at org.apache.http.impl.nio.DefaultNHttpClientConnection.produceOutput(DefaultNHttpClientConnection.java:290) at org.apache.http.impl.nio.client.InternalIODispatch.onOutputReady(InternalIODispatch.java:86) at org.apache.http.impl.nio.client.InternalIODispatch.onOutputReady(InternalIODispatch.java:39) at org.apache.http.impl.nio.reactor.AbstractIODispatch.outputReady(AbstractIODispatch.java:145) at org.apache.http.impl.nio.reactor.BaseIOReactor.writable(BaseIOReactor.java:188) at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:341) at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315) at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276) at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588) at java.lang.Thread.run(Thread.java:745)
2018-04-10 16:06:25,009 ERROR [Timer-Driven Process Thread-3] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=input,targets=http://localhost:8080/nifi] failed to communicate with remote NiFi instance due to java.io.IOException: Failed to confirm transaction with Peer[url=http://HW13076.local:8080/nifi-api] due to java.io.IOException: Awaiting transferDataLatch has been timeout. 2018-04-10 16:06:25,009 ERROR [Timer-Driven Process Thread-3] o.a.nifi.remote.StandardRemoteGroupPort java.io.IOException: Failed to confirm transaction with Peer[url=http://HW13076.local:8080/nifi-api] due to java.io.IOException: Awaiting transferDataLatch has been timeout. at org.apache.nifi.remote.AbstractTransaction.confirm(AbstractTransaction.java:264) at org.apache.nifi.remote.StandardRemoteGroupPort.transferFlowFiles(StandardRemoteGroupPort.java:369) at org.apache.nifi.remote.StandardRemoteGroupPort.onTrigger(StandardRemoteGroupPort.java:285) at org.apache.nifi.controller.AbstractPort.onTrigger(AbstractPort.java:250) at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:175) at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Awaiting transferDataLatch has been timeout. at org.apache.nifi.remote.util.SiteToSiteRestApiClient.finishTransferFlowFiles(SiteToSiteRestApiClient.java:938) at org.apache.nifi.remote.protocol.http.HttpClientTransaction.readTransactionResponse(HttpClientTransaction.java:93) at org.apache.nifi.remote.AbstractTransaction.confirm(AbstractTransaction.java:239) ... 12 common frames omitted
I will continue investigating the cause. I tested sending the same file with RAW S2S and worked just fine. Please use RAW transport protocol if possible.
Created 04-10-2018 08:23 AM
Thank you very much for reporting the issue. It was a bug with HTTP S2S transport protocol. It can not send data more than 2GB at once. I filed Apache NiFi JIRA and a patch for that. https://issues.apache.org/jira/browse/NIFI-5065
As a work-around, please use RAW S2S transport protocol instead, it can send large files without issue.