Machine 1 will generate 1GB of data every 14 seconds +/- (many files with sizes varying from a couple of KB to files with 30 MB). Machine 1 needs to send all these files to Machine 2 as fast as possible, preferably in less than 14 seconds.
For the tests we had to use a 100 Mbit/s link, but in the future we will have a 1Gbit/link.
Use NiFi to send data from Machine 1 to Machine 2.
NiFi version 0.7.2 - we have to use this version because of the current Java version running on the machines.
Both machines are quite powerful, running with SSDs.
For the tests both machine were connected with a direct link, 100 Mbit/s
NiFi standalone running on Machine 1 and 2
Very simple flow as follow (sender on the left, receiver on the right):
RPG using raw socket protocol (the only available in this NiFi version)
The test consisted in sending 1 GB of data, varying the number and size of the files sent.
In the end we tried to send two 1GB files to see the result.
100 x 10MB
10 x 100 MB
1 x 1GB
2 x 1GB
8 – 12 MB/s
48 – 70 MB/s
99 -100 MB/s
Results deduced from the data provenance.
We also tried to change the JVM heap memory, which didn't make a lot of difference.
We can see that for large files NiFi can use all the bandwidth available, but for small files the performance is not that good.
NiFi Cluster with two nodes on Machine 1 and 2
The flow used for the sender was a bit more complex this time. We had to use ListFile and a RPG on the Primary node only in order to balance the load. We also used the GenerateFlowFile processor to generate the flow files.
The Sender flow:
The receiver flow was the same as in Test 1
We can see some improvement when clustering, although still very low performance for small files.
The CPU usages increases drastically, which could impact the machine's 1 production.
Looking forward for your feedback. We really liked NiFi when we started to use it, but now it is showing some performance issues that could make us to not use it at all.