Member since
07-30-2019
3471
Posts
1642
Kudos Received
1020
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 150 | 06-03-2026 06:06 PM | |
| 460 | 05-06-2026 09:16 AM | |
| 830 | 05-04-2026 05:20 AM | |
| 498 | 05-01-2026 10:15 AM | |
| 623 | 03-23-2026 05:44 AM |
04-08-2019
01:37 PM
@Abhinav Joshi What was the reason given for why the UpdateAttribute processors were now invalid?
... View more
04-01-2019
03:56 PM
@Isha Tiwari In order to merge FlowFiles that exist on multiple nodes in your cluster you are going to need to move all FlowFiles to one node. Apache NiFi 1.9.x versions introduced a new "Load Balanced" configuration option on dataflow connections. One of the options for the configurable "Load Balance Strategy" is "Single node". Setting this strategy will route all queued FlowFiles to one node in the cluster. You could set this on the connection feeding your Merge processor. - In Apache NiFi 1.8 and older you would need to use the PostHTTP processor (configured to send as FlowFile) to send all FlowFiles to a ListenHTTP processor running at one of your nodes URL (processor ill run on al nodes, but your postHTTP will only be configured with URL for one node). Problem with this solution is that if the target URL server goes down, your dataflow will stop working. - Thank you, Matt
... View more
03-29-2019
06:13 PM
@Isha Tiwari Did you change your Max bin age setting to a value higher than 1 minutes? Try setting it to 10 minutes. Is your NiFi a standalone instance or a NiFi cluster? Keep in mind that each Node in a NIFi cluster runs is own copy of the flow.xml.gz and works on it own set of FlowFiles. So the merge processor can only bin and merge the FlowFiles local to each node. Thanks, Matt
... View more
03-28-2019
05:05 PM
3 Kudos
@Isha Tiwari - The "Max" configuration properties do not govern how long a bin waits to be merged. The Merge based processors work as follows: - 1. The processor executes based upon the configured run Schedule. 2. At exact time of execution the Merge processor looks at what FlowFiles exist on Inbound connection that have not already been allocated to a Merge processor bin. 3. Those FlowFiles are then allocated to one or more bins. The max Bin size and Max number records create a ceiling for how many FlowFiles can be allocate to a bin. If a bin has reached one of these max values, additional FlowFile in this current execution start getting allocated to a new bin. 4. Once all FlowFiles from this current execution (Thread does not keep checking for new FlowFiles coming in to inbound connection. Those new FlowFiles would be handled by next execution) have been allocated to one or more bins, those bins are evaluated to see if they are eligible to be merged. In order to be eligible the bin must meet both minimum settings for size and number of records or the max bin age has been reached. In your case, a bin could be binned with only 20 records and 20 KB of size or if a bin has existed for at least 1 minute. - If you find your merging small bins consistently, changing the run schedule on your merge processor should help. This would allow more time between executions for FlowFiles to queue on the inbound connection. - IMPORTANT: Keep in mind that all FlowFiles allocated to bins are bing held in heap memory (swapping does not occur with bins). Specifically the FlowFile attributes/Metadata is the portion of the FlowFile held in heap memory. Your max records of 100,000 could result in considerable heap pressure. Using two Merge processors in series could achieve same result with lower heap usage. - I use MergeContent in following Article about connection queues as an example: https://community.hortonworks.com/articles/184990/dissecting-the-nifi-connection-heap-usage-and-perf.html - Thank you, Matt -If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.
... View more
03-22-2019
08:58 PM
@Mario Tigua The File Filter property in the listFile processor does not support NiFi Expression Language. If yo float your cursor of the question mark icon to the right of a property name it will display a pop-up window that will tell you if this property will support NiFi expression language. - This property expects a java regular expression instead. - Thank you, Matt - If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.
... View more
03-22-2019
12:36 PM
@sri chaturvedi - The question here what is the use case for needing to have a sequence number across entire cluster. Why not generate a sequential number per node to keep track of per node batches. Maybe use the NiFi hostname in the sequence number identifier? - Maybe you can share some more details on the full use case for these sequence numbers. Why you are generating and what they are being used for. - If you use the distributedMapCache, you could keep three different sequential number cached values (each node has its own sequence number stored in a cache entry by hostname. - You could then build a flow that fetches all three value add adds them together for you on an hourly/daily/weekly schedule? - Thank you, Matt
... View more
03-22-2019
12:18 PM
1 Kudo
@Shu @sri chaturvedi The issue with this distributed cache solution is timing. You have a bit of a race condition to consider here. Each of the nodes in a NiFi cluster runs their own copy of the flow.xml.gz and process their own set of FlowFiles. While the Distributed cache allows you to have all your NiFi nodes reading and writing to the same Distributed Map Cache server, they are doing it at their own pace. This means that multiple nodes may end up pulling the same cache value incrementing it and writing it back to the cache server so you do not end up with a true count. You may also have a condition where node 1 fetches value X from cache server and for some reason the flow on that node 1 is delayed getting to putDistributeCache to to update value. Mean time some other node has already fetched and updated the cache multiple times. Now node 1 puts and overwrites newer value with an older count. - Dealing with such a "race" condition is going to be difficult here because of how NiFi clustering works.
... View more
03-21-2019
02:00 PM
@TRACEY JACKSON NiFi processors are configurable to run using a Timer Driven or Cron Driven scheduling strategy. This scheduling can uniquely configured per each processors configuration. Processors are then started and the operate based on the configured scheduler from that point forward. - The NiFi API can be used to perform any action a user can perform directly from the UI. The best way to learn the rest-api calls is to use the "Developer tools" available via most browsers. - While focused on tab where NiFi UI is open, launch the developers tools. Chrome Browser example: Then perform the action via the NiFi UI and you will see the "Network" call display in list. You can right click on that call and select "Copy as curl". Now you have an example of how to execute that same request via command line. - For more info in the NIFi rest-api, you can look in "help" found within the NiFi UI Global menu (upper right corner). The Rest Api documentation can be found in the "Developer" section at bottom of list on far left side of help UI. - The danger with trying to "run" dataflows in NiFi via command line is you may end up stopping processors in the dataflow chain that results in FlowFile being unprocessed and sitting on connection queues between processors. - Thank you, Matt - If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.
... View more
03-20-2019
12:32 PM
1 Kudo
@sri chaturvedi - The UpdateAttribute processor state capability can only store state locally on each node in the cluster. Other nodes in your cluster have no idea what local state value has been stored on other nodes in your cluster. - So I suspect one or both of the following is occurring: 1. The upstream dataflow data originates on the Primary node only (For example source ingest processor runs "primary node" execution). On NiFi restart a different node in your 3 node cluster is elected as primary node and now the FlowFiles traversing this dataflow are on a different node where there is no previous state, so it appears as if state started over. This would explain the appearance of a reset on cluster restart. 2. This UpdateAttribute processor is receiving inbound FlowFiles on all three nodes. Since each node stores its own state locally for this processor, each would be incrementing its own count independently of each other. This would explain the -duplicate state values seen on some FlowFiles. - Thank you, Matt - If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.
... View more
03-19-2019
07:36 PM
@Sean Dockery A Jira has been filled to comment out the G1GC line in the NiFi bootstrap.conf in next Apache release: https://issues.apache.org/jira/browse/NIFI-6132
... View more