Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Config parameters for 10000 events per second which coming as input to ListenUDP and MergeContent

avatar
Expert Contributor

qq1.pngqq2.pngHi All,

Thanks

I am having a hard time doing permutations and combinations to tune the properties of both of ListenUDP and MergeContent processors.

Are they standard values for each of these properties for 10000 events per second?

1 ACCEPTED SOLUTION

avatar
Super Mentor
7 REPLIES 7

avatar
@dhieru singh

What are the average sizes of the flow files coming out of the ListenUDP processor?

avatar
Expert Contributor

Hi @Wynner Thanks How do I figure out the size of flowfile coming out from ListenUDP processor?

avatar
Super Mentor

avatar
Expert Contributor

@Matt Clarke

Hey Matt Thanks for the link, It is an awesome link, however I am having trouble to tune it with MergeContent processor.

Thanks

Dheeru

avatar
Super Mentor

@dhieru singh

I am assuming you are not having any issues with your ListenUDP processor? It is successfully keeping up with your 10,000 messages per second?

Is the real problem here how fast the MergeContent processor is merging those queued FlowFiles between ListenUDP and MergeContent?

I can tell you that trying to merge 127,000 FlowFiles at a time via the MergeContent processor is going to put a lot of pressure on your NiFi heap. I would not be surprised if you encountered Out-Of-Memory (OOM) errors. That pressure is caused by FlowFile Attributes. FlowFile Attributes for every FlowFile being merged by the MergeContent processor is being held in heap.

To reduce that heap pressure, I suggest using two MergeContent processors in series. Have the first merge FlowFiles based on Min num entries of 10000 and max num entries of 15000. Then feed success form that MergeContent to another MergeContent configured to Merge again based on Min and max bin size. The end result is better performance and less pressure on heap.

Now you also have the ability to set a higher number of concurrent tasks on your MergeContent processors. This allows this processors to execute numerous simultaneous times (if sufficient work exists). Each concurrent task would have the ability to merge a different "bin" at the same time. The formula here should never exceed number of bins + 1 equal to or greater then the number of concurrent tasks. Foe example.: If mergeContent is configured for 7 bins, there should not be more then 6 concurrent tasks assigned to this processor.

Once you make these changes, you will need to keep an eye out for OOM errors still. More concurrent tasks also means more heap usage by the mergeContent processors. You may find that you need to allocate more memory to your NiFi JVM to support your dataflow design.

Also make sure you have optimized the number of overall threads your NiFi instance is allowed to use. This is found under "Controller settings" in the hamburger menu. The default is set to only 10 Max Timer Driven Thread count. (Don't worry about the Event Driven Thread count). This means that all components on your canvas must share these 10 threads only. The setting for Max Timer Driven thread count should be set 2 - 4 times the number of cores available on your NiFi server.

Once you makes changes to concurrent tasks and max thread count settings, keep an eye on your CPU usage on your server to make sure you have not over-allocated resulting in 100% CPU usage all the time.

Now with ListenUDP processor, you could increase the "Max batch Size" so that more data is written to each FlowFile that is output from this processor.

Hope this helps.

Thank you,

Matt

avatar
Expert Contributor

@Matt Clarke

Thanks a lot this helps. Appreciate your help. Is there anyway I can know the size of flow file coming out from ListenUDP processors

PS How can accept your answer?

Thanks

avatar
Super Mentor

@dhieru singh

FlowFiles generated by ListenUDP are placed on the outbound connection. One of the easiest ways to see the sizes of those FlowFiles is to right click on that connection (while it has queued data) and select "list queue" from the context menu that is displayed. It will open a new UI that will list all FlowFiles queued on that connection along with their details.

Matt