Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Question Deleted-2

Solved Go to solution

Question Deleted-2

 
1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Question Deleted-2

Master Guru

@Sravanthi Bellamkonda

In order for MergeContent processor to create ~64 MB merged FLowFiles from 1 KB source FlowFiles, it would need to merge ~65,500 FlowFiles. While the MergeContent processor is merging FlowFiles in a "Bin" the FlowFile mAttributes (metadata) is being held in NiFI's JVM heap memory.

This can commonly result in a Out Of Memory (OOM) condition.

A more common approach is to use two MergeContent processor in series to reduce the overall heap memory footprint for such a dataflow.

ListenTCP --> (success) --> MergeContent --> (merged) --> MergeContent --> (merged) --> PutHDFS

The first MergeContent processor would merge based upon in your case perhaps every 1024 KB "Minimum Group Size" and "Maximum Group Size" of perhaps 1040 KB. This would roughly merge ~1,000 FlowFiles per bin. These merged FlowFiles are then passed to another MergeContent processor that will merge based on every 60 MB "Minimum Group Size" and "Maximum Group Size" of perhaps 64 MB. This will result in merging ~60 FlowFiles per Bin.

I would set each "Maximum number of Bins" on both these MergeContent processors to 11. This would allow you to set the increase the "Concurrent tasks" on each MergeContent processor higher to improve performance. I would start with 3 - 5 concurrent tasks and see how that performs based on incoming data rate. I would not increase higher then 10. Just remember the more concurrent tasks given to any single processor equates to more CPU usage. So always start low and slowly increment up.

Generally we try to keep the number of FlowFiles merged per processor to between 10,000 to 20,000 to minimize heap usage.

Another use article about tuning NiFi's Listen based processors, can be found here:

https://community.hortonworks.com/articles/30424/optimizing-performance-of-apache-nifis-network-lis....

Thanks,

Matt

View solution in original post

3 REPLIES 3
Highlighted

Re: Question Deleted-2

Master Guru

@Sravanthi Bellamkonda

In order for MergeContent processor to create ~64 MB merged FLowFiles from 1 KB source FlowFiles, it would need to merge ~65,500 FlowFiles. While the MergeContent processor is merging FlowFiles in a "Bin" the FlowFile mAttributes (metadata) is being held in NiFI's JVM heap memory.

This can commonly result in a Out Of Memory (OOM) condition.

A more common approach is to use two MergeContent processor in series to reduce the overall heap memory footprint for such a dataflow.

ListenTCP --> (success) --> MergeContent --> (merged) --> MergeContent --> (merged) --> PutHDFS

The first MergeContent processor would merge based upon in your case perhaps every 1024 KB "Minimum Group Size" and "Maximum Group Size" of perhaps 1040 KB. This would roughly merge ~1,000 FlowFiles per bin. These merged FlowFiles are then passed to another MergeContent processor that will merge based on every 60 MB "Minimum Group Size" and "Maximum Group Size" of perhaps 64 MB. This will result in merging ~60 FlowFiles per Bin.

I would set each "Maximum number of Bins" on both these MergeContent processors to 11. This would allow you to set the increase the "Concurrent tasks" on each MergeContent processor higher to improve performance. I would start with 3 - 5 concurrent tasks and see how that performs based on incoming data rate. I would not increase higher then 10. Just remember the more concurrent tasks given to any single processor equates to more CPU usage. So always start low and slowly increment up.

Generally we try to keep the number of FlowFiles merged per processor to between 10,000 to 20,000 to minimize heap usage.

Another use article about tuning NiFi's Listen based processors, can be found here:

https://community.hortonworks.com/articles/30424/optimizing-performance-of-apache-nifis-network-lis....

Thanks,

Matt

View solution in original post

Highlighted

Re: Question Deleted-2

Master Guru

@Sravanthi Bellamkonda

Was my explanation helpful in addressing this specific question? If so, please take a moment to mark this naswer as accepeted to close out this thread.

Thank you,

Matt

Highlighted

Re: Question Deleted-2

Hi @Matt Clarke,

Your explanation was useful for me to build my Nifi flow. But I am experiencing a data loss of 7 records.I have posted about the same in forum. Below is the link:

https://community.hortonworks.com/questions/138873/data-loss-found-with-tcp-and-mergecontent-process...

Can you help me in figuring out the mistake I am doing in configuration of the processors?

Currently, I am using PutFile instead of PutHDFS for the sake of easy checks with lines count of merged content.

Sravanthi

Don't have an account?
Coming from Hortonworks? Activate your account here