Member since
07-30-2019
3396
Posts
1619
Kudos Received
1001
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 422 | 11-05-2025 11:01 AM | |
| 327 | 11-05-2025 08:01 AM | |
| 463 | 11-04-2025 10:16 AM | |
| 680 | 10-20-2025 06:29 AM | |
| 820 | 10-10-2025 08:03 AM |
06-06-2019
12:53 AM
Hi @Matt Clarke, Does the table you shared above hold true today as well? Apache Nifi Crash Course video on https://www.youtube.com/watch?v=fblkgr1PJ0o mentions the same that a cluster should preferably have a single digit number but if really needed you can rather have 2 separate clusters with 10 nodes each and establish a sync between them. All I am trying to understand is with the latest version it still holds true and 10 nodes are still good to hold hundreds of thousands of events per second? Thanks in advance!
... View more
10-19-2017
08:53 PM
@Abdelkrim Hadjidj Perfect! Much clearer now. Thanks.
... View more
10-03-2017
05:45 PM
@Matt Clarke Thanks for the quick response,I would raise the JIRA request, here is the enhancement request https://issues.apache.org/jira/browse/NIFI-4458
... View more
11-21-2018
03:45 PM
@Matt Clarke @Shu Thanks for you quick response Matt. Yes you are correct. It's working now. However I did not understood, unless I create a outport in PG1 I am unable to create a connection at root level to another PG2 via outport. Any idea why is that so?
... View more
09-27-2017
06:47 PM
@pawan soni
Did you resolve your invalidate state by starting your "From File" input port? Your screenshot shows the RPG as "Enable transmission" and the input port as "stopped". Thanks, Matt
... View more
09-22-2017
05:56 PM
@Matt Clarke, I am running this in Windows..how do i find under which service its running..??
... View more
09-19-2017
02:09 PM
@sally sally By setting your minimums (Min Num Entries and Min Group Size to some large value), FlowFiles that are added to a bin will not qualify for merging right away. You should then set "Max Bin Age" to a unit of time you are willing to allow a bin to hang around before it is merged regardless of the number of entries in that bin or that bins size. As far as the number of bins go, a new bin will be created for each unique filename found in the incoming queue. Should the MergeContent processor encounter more unique filenames then there are bins, the MergeContent processor will force merging of the oldest bin to free a bin for the new filename. So it is important to have enough bins to accommodate the number of unique filenames you expect to pass through this processor during the configured "max bin age" duration; otherwise, you could still end up with 1 FlowFile per merge. Thanks, Matt
... View more
09-15-2017
08:46 PM
2 Kudos
Hi @sally sally, List Hdfs processor are developed as store the last state.. i.e when you configure ListHDFS processor you are going to specify directory name in properties. once the processor lists all the files existed in that directory at the time it will stores the state as maximum file time when it got stored into HDFS. you can view the state info by clicking on view state button. if you want to clear the state then you need to get into view state and click on clear the state. 2. so once it saves the state in listhdfs processor, if you are running the processor by scheduling as cron(or)timer driven it will only checks for the new files after the state timestamp. Note:- as we are running ListHDFS on primary node only, but this state value will be stored across all the nodes of NiFi cluster as primary node got changed, there won't be any issues regarding duplicates. Example:- hadoop fs -ls /user/yashu/test/ Found 1 items
-rw-r--r-- 3 yash hdfs 3 2017-09-15 16:16 /user/yashu/test/part1.txt when i configure ListHDFS processor to list all the files in the above directory if you see the state of ListHDFS processor that should be same as when part1.txt got stored in HDFS in our case that should be 2017-09-15 16:16 it would be unix time in milliseconds when we convert the state time to date time format that should be Unixtime in milliseconds:- 1505506613479 Timestamp :- 2017-09-15 16:16:53 so the processor has stored the state, when it will run again it will lists only the new files that got stored after the state timestamp in to the directory and updates the state with new state time (i.e maximum file created in hadoop directory).
... View more