About JoeWitt

JoeWitt · ‎10-11-2016

@Riccardo Iacomini Looks like you're doing some really good stuff to think through this. Some things I would add is that it can often be quite ok to generate a lot of objects. A common source of GC pressure is the length of object retention and how many/how large retained objects are. For short lived objects that are created then eligible for cleaning I've generally found that causes little challenge for collection. The logic is a bit different with G1 but in any event I think we can go back to basics a bit here before spending time tuning the GC. I recommend running your flow with as small a heap as possible. Consider a 512 MB or 1GB heap for instance. Run with a single thread in the flow controller or very few. Run every processor with a single thread. Then let your flow run at full rate. Measure the latencies. Profile the code. You will find some very interesting and useful things this way. If you're designing a flow for maximum performance (lowest latency and highest throughput with minimal CPU utilization) then you really want to think about the design of the flow. I do not recommend using flow file attributes as a go between mechanism to deserialize and serialize content from one form to another. Take the original format and convert it to the desired output in another processor. If you need to make routing and filtering decisions then either do that on the raw format or the converted format. Which is the best choice depends on your case. Extract things to attributes so that you can reuse existing processes is attractive of course but if you primary aim above all else is raw speed then you want to design for that before other tradeoffs like reusability.

JoeWitt · ‎09-29-2016

So without doing anything fancy configuration wise and having a very basic template like pure-split-merge.xml (it assumes compressed input) i get around 13,333 events/sec on a very stable basis. The disk is moving but fine. CPU is pretty busy but fine. GC is busy but fine and no full GCs. So, at this point it looks like there are some opportunities to improve how we schedule processors to be both more aggressive and less noisy (when there is no work to do). So a few of us are looking into that. This goes to your question of wasting speed. We see some cases where our scheduler itself could be wasting speed opportunities. Now, in the mean time a definitely fast option is to avoid the need to split data in the first place. Simply have the processors which were extracting attributes and then later altering content be composed together and operate on the dataset events. That is less elegant and reusable admittedly so I'm not proposing that is the end solution just stating that this approach works well. Anyway, we'll keep this conversation going. This is a perfect case to evaluate performance as it exercises a few important elements and can be a common pattern. More to follow as we learn more. This will almost certainly end up in a blog/article 🙂

JoeWitt · ‎09-28-2016

No problem at all on time. Happy to help. This should be faster so let's figure out what is happening. Appreciate the details you're providing. I've recreated a very similar flow. I am seeing basically 10,000-20,000 events per second (depending on tuning). In a very basic, default everything, single threaded flow i am getting an end-to-end 20,000 events/sec equating to about 20MB/sec. This is on my macbook. The amount of disk usage happening to make this happen given all the processors I have in the flow equates to about 60MB/s read with 50MB/s write. That is all steady state and healthy. But it does seem like it should be faster. Disk isn't tapped out nor is CPU and GC looks great. So, adding threads...performance actually seemed to drop a bit in this case and when I pushed it with a variety of scenarios it then did show these OOME. So, will be looking into this more. I've still got a 512MB heap so first I'll bump that a bit which is reasonable given what I'm trying to do now. Regarding your CopyProcessor keep in mind the UpdateAttribute processor does what you describe already and supports batching nicely. Regarding the logic of when to combine them into one or not yeah I totally agree with your thinking. Just wanted to put that out there for consideration. If you've already thought through that then I'm all with you. Will provide more thoughts as I/we get a better handle on the bottlenecks and options to move the needle.

JoeWitt · ‎09-28-2016

Cool. Thanks for all the details. Yes let's definitely avoid going the cluster route right now. Once we see reasonable node performance then we can deal with scale out. Some quick observations Your garbage collection results look great. The custom procs are indeed rather slow. Though I'm not overly impressed with the numbers I see on the standard procs either. The second split text took 18 seconds to split 300MB worth of lines. Not ideal. You definitely should take advantage of NiFi's automatic session batching capability. Check out the SupportsBatching annotation and you can find several examples of its use. By having a processor support that and in the UI having 'run duration' higher than 0 NiFI can automatically combine several commits into one and this can yield far higher throughput at the expense of latency (on the order of milliseconds) Questions: What is the underlying storage device that NiFi is using? Type of disk (local disk, HDD or SDD). Type of partitioning (are all the repos on a single disk). Have you considered restructuring the composition of those custom processors? Could/should a couple reasonably be combined into a single step? ProcessNullFields performance appears really poor. Have you done any testing/evaluation to see what that processor is spending the bulk of its time on? Attaching a debugger/profiler at runtime could be really enlightening. CopyProcessor also appears heavy in terms of time. What does that one do? I'll setup a vanilla flow that brings in a file like you mention, splits it, merges it, all on a basic laptop setup and let you know the results I see.

JoeWitt · ‎09-27-2016

Could you please list the processors you have in the flow? The processors Matt notes can use a decent chunk of memory but it is not really based on original size of the input entry. It is more about the metadata for the individual flowfiles themselves. So a large input file does not necessarily mean a large heap usage. The metadata for the flowfiles is in memory but typically a very small amount of content is ever in memory. Some processors though do use a lot of memory for one reason or another. We should probably put warnings about them in their docs and on the UI. Let's look through the list and identify candidates.

JoeWitt · ‎09-27-2016

Hello, how many lines/rows are in each incoming CSV file? A common pattern here is to do two-phase splits where the first phase splits into say 5000 line bundles and the second phase splits into single lines. Then using back pressure you can avoid ever creating too many flowfiles at once that aren't being operated on and simply causing excessive GC pressure. On a rather modest system you should very easily see performance (in conservative terms) of 30+ MB/s for 10s or 100s of thousands of rows per second (as described in this flow). The bottleneck points in the flow should be fairly easily spotted through the UI. Could you attach a screenshot of the running flow? One thing that is not easily spotted is when it is GC pressure causing the issue. If you click on the summary page on go to 'system diagnostics' you can view garbage collection info. What does it show?

JoeWitt · ‎09-08-2016

Yep what you describe with UpdateAttribute/MergeContent sounds perfectly fine. What you'll want there precisely will depend on how many relationships you have out of RouteText. As for concurrent tasks I'd say it would be 1 for GetFile 1 for SplitFile 2...4 or 5 or so on RouteText. No need to go too high generally. 1 for MergeContent 1 to 2 for PutHDFS You don't have to stress too much on those numbers out of the gate. You can run it with minimal threads first, find any bottlenecks and increase if necessary.

JoeWitt · ‎09-07-2016

Thea If you look in the nifi-assembly/target folder what do you see and how large are the files? It really just looks like an incomplete build at this point. Consider grabbing a convenience binary and using that so you can rule out local build issues. Thanks

JoeWitt · ‎09-07-2016

Hello, In Apache NiFi 1.0 there are no longer UI controls or API calls exposed to change whether or not a node is the primary or cluster coordinator because they are now automatically elected and maintained by the zero master clustering model backed by Zookeeper. It is the case that any node at any time should be capable to take on those designations. This ensures that we don't need any special nodes and that we're able to always have these valuable roles active from an HA perspective - which was not the case previously. Thanks Joe

JoeWitt · ‎09-06-2016

I'm not sure I understand the versus nature as posed here. MirrorMaker can be used to replicate data from one Kafka broker to another. The NiFi site-to-site protocol can be used to replicate data from one NiFi cluster to another. They both support the appropriate security mechanisms. NiFi offers the fine grained provenance/lineage but arguably Kafka's log replication/offset mechanism is sufficient for the case of replication. As for tuning again both offer strong tuning/throughput mechanisms. I'd recommend using the facilities of each.

Online	Offline
Last Visited	‎11-07-2019 09:38 AM

Member Since	‎07-30-2019 09:22 AM
Last Visited	‎11-07-2019 09:38 AM
Posts	105
Kudos received	129

Cloudera Community

Re: ListenTCP or ListenSyslog

Re: Is there Kubernetes config available for Nifi ...

Re: Nifi ConsumeKafka_0_10 error

Re: Best practice for updating Nifi's 'Hadoop Conf...

Re: NIFI HiveStreaming error

Re: NiFi: GC tuning issues

Re: NiFi: unable to improve performances

Re: NiFi: unable to improve performances

Re: NiFi: unable to improve performances

Re: NIFI java.lang.OutOfMemoryError: Java heap spa...

Re: NiFi: unable to improve performances

Re: NIFI RouteText processor taking too long

Re: Error: ,Error: Could not find or load main cla...

Re: Can not determine which node is currently ele...

Re: Open forum discussion on pros and cons of Apac...