About MattWho

MattWho · ‎09-14-2017

@Sravanthi Bellamkonda In order for MergeContent processor to create ~64 MB merged FLowFiles from 1 KB source FlowFiles, it would need to merge ~65,500 FlowFiles. While the MergeContent processor is merging FlowFiles in a "Bin" the FlowFile mAttributes (metadata) is being held in NiFI's JVM heap memory. This can commonly result in a Out Of Memory (OOM) condition. A more common approach is to use two MergeContent processor in series to reduce the overall heap memory footprint for such a dataflow. ListenTCP --> (success) --> MergeContent --> (merged) --> MergeContent --> (merged) --> PutHDFS The first MergeContent processor would merge based upon in your case perhaps every 1024 KB "Minimum Group Size" and "Maximum Group Size" of perhaps 1040 KB. This would roughly merge ~1,000 FlowFiles per bin. These merged FlowFiles are then passed to another MergeContent processor that will merge based on every 60 MB "Minimum Group Size" and "Maximum Group Size" of perhaps 64 MB. This will result in merging ~60 FlowFiles per Bin. I would set each "Maximum number of Bins" on both these MergeContent processors to 11. This would allow you to set the increase the "Concurrent tasks" on each MergeContent processor higher to improve performance. I would start with 3 - 5 concurrent tasks and see how that performs based on incoming data rate. I would not increase higher then 10. Just remember the more concurrent tasks given to any single processor equates to more CPU usage. So always start low and slowly increment up. Generally we try to keep the number of FlowFiles merged per processor to between 10,000 to 20,000 to minimize heap usage. Another use article about tuning NiFi's Listen based processors, can be found here: https://community.hortonworks.com/articles/30424/optimizing-performance-of-apache-nifis-network-lis.html Thanks, Matt

MattWho · ‎09-13-2017

@Juan Manuel Nieto You cannot specify both an "Initial Admin Identity" and a "Legacy Authorized Users File" in the authorizers.xml file. Try removing the the legacy Authorized Users File and restarting NiFi to see if the users.xml and authorizations.xml files get generated. Thanks, Matt

MattWho · ‎09-13-2017

@Jon Rodriguez Breton Glad this worked for you. As far as your new question: The value written to the DistributedMapCache remains in the cache for a configured amount of time or until x configured number of entries exist. So you can compare many files against this store value. So any FlowFile that matches a stored value is consider a duplicate. It is not a one time match of a single duplicate. It would be very expensive to build a NiFi processor that would read in large batches of queued FlowFiles form a inbound queue to do comparisons on FlowFile Attributes (FlowFile attributes live in heap memory space, so the more FlowFile you pull in to do a comparison on, the more likely you are to encounter Out Of Memory). So if you limit the size of the comparisons, how do you know a given batch contains the actual FlowFiles you want to compare? This is why the detect duplicate makes use of an external service and compares FlowFiles against a stored value one FlowFile at a time. Thanks, Matt

MattWho · ‎09-12-2017

@sally sally That is correct, adding 7,200,000 ms will increase your delay to 2 hours. If you found my answer addressed your question, please mark it as the accepted answer by clicking Thanks, Matt

MattWho · ‎09-12-2017

@Juan Manuel Nieto NiFi must be configured to run securely over https using SSL before any user authentication can be used. Thanks, Matt

MattWho · ‎09-12-2017

@sally sally You could use an UpdateAttribute and RouteOnAttribute processor to delay FlowFiles by two minutes before passing them to your next processor instead. The UpdateAttribute processor is used to create a new attribute with a value of the current epoch time in Milliseconds currentTime = ${now():toNumber()} The RouteOnAttribute then adds 2 minutes to that attribute and checks to see if it os less then or equal to the current time. If it is less then, your FlowFiles are sent to the unmatched Relationship which is looped back on the processor. The FlowFiles will continue to loop until 2 minutes have past at which time they will be routed to the matched (2mins) relationship created using teh following: 2mins = ${currentTime:plus(120000):le(${now()})} Thanks, Matt

MattWho · ‎09-12-2017

@Jon Rodriguez Breton Are you trying to see if all attributes from both FlowFiles match exactly or is their a specific attribute from each FlowFile you want to compare? My initial thought would be to use the DetectDuplicate processor. You could write the unique attribute to the DistributedMapCache service. Then compare new FlowFiles against that stored value and deleted any duplicates. That way only the first FlowFile would get passed on. Thanks, Matt

MattWho · ‎09-01-2017

Issue was browser version related. Switching to a newer version of the browser resolved this issue.

MattWho · ‎09-01-2017

@Kiem Nguyen I highly recommend starting a new question in Hortonworks community connection for this. Diagnosing what caused your node to disconnect and how to resolve is a different topic from how to stop a processor with a disconnected node. It would also be helpful to explain what you mean by "overloaded queue" and what makes you feel the size of your queue triggered your node to disconnect. What error did you see in the nifi-app.log on the node that disconnected. Thanks, Matt

MattWho · ‎08-31-2017

@sally sally The user who is logged in and building out dataflow, has no correlation to who that dataflow is running as. All the processors on the canvas are being executed by the user who owns the Nifi process itself. So when you setup a SSL Context Service to use a specific keystore and truststore, it is the PrivateKeyEntry in that keystore that will be used as the user for authentication and authorization during any established connection. The TrustedCertEntry(s) in the truststore provided in the SSL Context Service will be used to establish trust of the Server certificates passed by the endpoint (in your case the certs being passed from your NiFi nodes) during the two-way TLS handshake. Now this is a little different then when you log in via the browser to the UI. Two-way TLS is not enforced by your browser like it is by NiFi's processors. Your browser likely did not trust the cert presented by your NiFI nodes, and you added an exception the first time you connected saying that you would like to trust that unknown cert coming from the nifi node. Within NiFi and the SSL Context Service, there is no way to add such an exception. So trust must work in both directions. This means the truststore you use in your ssl Context Service must be able to trust the certificates being passed by each of your Nifi nodes. Thanks, Matt

Online	Online
Last Visited	‎01-31-2026 11:55 AM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎01-31-2026 11:55 AM
Posts	3,427
Kudos received	1628

Cloudera Community

Re: Setting TTL per key when writing to redis

Re: Best Practice for configuring registry flows

Re: Nifi 2.7.2 Start Problem

Re: Error importing NiFi workflow template from ve...

Re: Error importing NiFi workflow template from ve...

Re: Question Deleted-2

Re: Nifi several issues trying to resolve Untruste...

Re: Compare attributes of different flowfiles

Re: Nifi:Manipulatig Cron Scheduling

Re: Nifi don't show any login screen with ldap-pro...

Re: Nifi:Manipulatig Cron Scheduling

Re: Compare attributes of different flowfiles

Re: Access Policies are not showing in NIFI UI

Re: Can not stop processor in cluster when a node ...

Re: Using ssl cert file for autentification in...