Member since
07-30-2019
3400
Posts
1621
Kudos Received
1003
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 88 | 12-05-2025 08:25 AM | |
| 231 | 12-03-2025 10:21 AM | |
| 518 | 11-05-2025 11:01 AM | |
| 394 | 11-05-2025 08:01 AM | |
| 685 | 11-04-2025 10:16 AM |
09-18-2017
06:24 PM
@Sravanthi Bellamkonda Was my explanation helpful in addressing this specific question? If so, please take a moment to mark this naswer as accepeted to close out this thread. Thank you, Matt
... View more
09-15-2017
08:39 PM
1 Kudo
In order to have listing start over again, you would need to perform the following: 1. Open "Component State" UI by right clicking on the listHDFS processor and select "view state". 2. Within that UI you will see a blue link "Clear state" which will clear the currentlr retained state.
... View more
09-15-2017
01:42 PM
1 Kudo
@Jon Rodriguez Breton There are no dedicated processors for removing cached entries from the distributed map cache. You can try using the "Age Off Duration" property in the detect duplicate processor or use a scripting processor in NiFi to execute a script to clear the cache. The follwoing Jira covers this missing processor as well as provide a sample template https://issues.apache.org/jira/browse/NIFI-4173
... View more
09-14-2017
12:47 PM
3 Kudos
@Sravanthi Bellamkonda In order for MergeContent processor to create ~64 MB merged FLowFiles from 1 KB source FlowFiles, it would need to merge ~65,500 FlowFiles. While the MergeContent processor is merging FlowFiles in a "Bin" the FlowFile mAttributes (metadata) is being held in NiFI's JVM heap memory. This can commonly result in a Out Of Memory (OOM) condition. A more common approach is to use two MergeContent processor in series to reduce the overall heap memory footprint for such a dataflow. ListenTCP --> (success) --> MergeContent --> (merged) --> MergeContent --> (merged) --> PutHDFS The first MergeContent processor would merge based upon in your case perhaps every 1024 KB "Minimum Group Size" and "Maximum Group Size" of perhaps 1040 KB. This would roughly merge ~1,000 FlowFiles per bin. These merged FlowFiles are then passed to another MergeContent processor that will merge based on every 60 MB "Minimum Group Size" and "Maximum Group Size" of perhaps 64 MB. This will result in merging ~60 FlowFiles per Bin. I would set each "Maximum number of Bins" on both these MergeContent processors to 11. This would allow you to set the increase the "Concurrent tasks" on each MergeContent processor higher to improve performance. I would start with 3 - 5 concurrent tasks and see how that performs based on incoming data rate. I would not increase higher then 10. Just remember the more concurrent tasks given to any single processor equates to more CPU usage. So always start low and slowly increment up. Generally we try to keep the number of FlowFiles merged per processor to between 10,000 to 20,000 to minimize heap usage. Another use article about tuning NiFi's Listen based processors, can be found here: https://community.hortonworks.com/articles/30424/optimizing-performance-of-apache-nifis-network-lis.html Thanks, Matt
... View more
09-13-2017
04:21 PM
@Juan Manuel Nieto You cannot specify both an "Initial Admin Identity" and a "Legacy Authorized Users File" in the authorizers.xml file. Try removing the the legacy Authorized Users File and restarting NiFi to see if the users.xml and authorizations.xml files get generated. Thanks, Matt
... View more
09-13-2017
12:48 PM
1 Kudo
@Jon Rodriguez Breton Glad this worked for you. As far as your new question: The value written to the DistributedMapCache remains in the cache for a configured amount of time or until x configured number of entries exist. So you can compare many files against this store value. So any FlowFile that matches a stored value is consider a duplicate. It is not a one time match of a single duplicate. It would be very expensive to build a NiFi processor that would read in large batches of queued FlowFiles form a inbound queue to do comparisons on FlowFile Attributes (FlowFile attributes live in heap memory space, so the more FlowFile you pull in to do a comparison on, the more likely you are to encounter Out Of Memory). So if you limit the size of the comparisons, how do you know a given batch contains the actual FlowFiles you want to compare? This is why the detect duplicate makes use of an external service and compares FlowFiles against a stored value one FlowFile at a time. Thanks, Matt
... View more
09-12-2017
02:06 PM
@sally sally That is correct, adding 7,200,000 ms will increase your delay to 2 hours. If you found my answer addressed your question, please mark it as the accepted answer by clicking Thanks, Matt
... View more
09-12-2017
01:43 PM
@Juan Manuel Nieto NiFi must be configured to run securely over https using SSL before any user authentication can be used. Thanks, Matt
... View more
09-12-2017
01:28 PM
@sally sally You could use an UpdateAttribute and RouteOnAttribute processor to delay FlowFiles by two minutes before passing them to your next processor instead. The UpdateAttribute processor is used to create a new attribute with a value of the current epoch time in Milliseconds currentTime = ${now():toNumber()} The RouteOnAttribute then adds 2 minutes to that attribute and checks to see if it os less then or equal to the current time. If it is less then, your FlowFiles are sent to the unmatched Relationship which is looped back on the processor. The FlowFiles will continue to loop until 2 minutes have past at which time they will be routed to the matched (2mins) relationship created using teh following: 2mins = ${currentTime:plus(120000):le(${now()})} Thanks, Matt
... View more
09-12-2017
12:45 PM
@Jon Rodriguez Breton Are you trying to see if all attributes from both FlowFiles match exactly or is their a specific attribute from each FlowFile you want to compare? My initial thought would be to use the DetectDuplicate processor. You could write the unique attribute to the DistributedMapCache service. Then compare new FlowFiles against that stored value and deleted any duplicates. That way only the first FlowFile would get passed on. Thanks, Matt
... View more