Member since
07-30-2019
3471
Posts
1642
Kudos Received
1020
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 133 | 06-03-2026 06:06 PM | |
| 450 | 05-06-2026 09:16 AM | |
| 807 | 05-04-2026 05:20 AM | |
| 485 | 05-01-2026 10:15 AM | |
| 614 | 03-23-2026 05:44 AM |
11-21-2016
08:39 PM
@Philippe Marseille Apache NiFi 1.1 should be going up for vote very soon..
... View more
11-21-2016
07:36 PM
1 Kudo
@Philippe Marseille The content size displayed in a the UI will not map exactly to disk utilization since Nifi stores multiple FlowFiles in a single claim in the content repo. A claim cannot be deleted until Every FlowFile in contains has reached a point of termination in your dataflow. so it is possible with 450,000 queued FlowFiles you are holding on to a large number of claims still. Try clearing out some of this backlog and see if disk usage drops. Setting backpressure thresholds on connections is good way to prevent your queues from getting so large. Another possibility is that you are running in to https://issues.apache.org/jira/browse/NIFI-2925 . This bug has been addressed for the next Apache NiFi release of 1.1 and HDF 2.1 Thanks, Matt
... View more
11-14-2016
09:49 PM
I believe the process you have is spot on and keeps the number of processors to a minimum. Matt
... View more
11-14-2016
08:06 PM
Also recommend against putting the quotes around your folder names ('MS1' should be just MS1).
... View more
11-14-2016
08:04 PM
1 Kudo
@Saikrishna Tarapareddy Sounds like your Conditional EL statements are not resulting in a boolean true in your UpdateAttribute processor.
After some FlowFiles get routed through the UpdateAttribute let them queue on the outbound connection (Stop the next processor). Right click on the connection and select "List queue". Click on the "view details" icon to the far left of a FlowFile and look at the Attributes on that FlowFile. Do you see the expected "Folder" attribute? is it set to the correct value? If it does not exist, does the filename match exactly one of the provided strings in your EL condition statements? Thanks, Matt
... View more
11-14-2016
01:09 PM
1 Kudo
@Lucas Alvarez The SplitJSON processor splits an incoming JSON on to multiple output JSON messages. You should use the EvaluateJSONPath processor to extract the URL from your splits and ssign them to a FlowFIle attribute you acn then use in your InvokeHTTP processor. Thanks, Matt
... View more
11-02-2016
07:54 PM
1 Kudo
@Paul Yang 1. There is an existing open Jira for being able to adjust the batch size of Site-to-Site. (https://issues.apache.org/jira/browse/NIFI-1202) 2. NiFi does not restrict how many RPGs can be added to the canvas. What is important to understand is that NiFi Nodes do not know about one another. Each runs the dataflow. When using RPGs to pull data from an output port, every node is running that RPG and every node is requesting FlowFiles. When one of those nodes connects the cluster informs that connecting instances that x number of FlowFile are currently queued to that output port and that Node will pull them all. so you get much better load-balance behavior forma push to an input port (yet still done in batches of 100). 3. Two suggestions come to mind: a. Reduce the configured "partition size" value in your GenerateTableFetch processor so more FlowFiles are generated which should then get better load balanced across you nodes. b. Instead of using S2S, build a load-balanced dataflow that is hard-coded to deliver data to each node as follows:
... View more
11-02-2016
06:26 PM
1 Kudo
@apsaltis I might suggest we make a few changes to this article: 1. The link you have for installing HDF talks about installing HDF 2.0. HDF 2.0 is based off Apache NiFi 1.0. Since MiNiFi is built from Apache NiFi 0.6.1, the dataflows built and templated for conversion into MiNiFi YAML files must also be built using an Apache 0.6 based NiFi install. (I see in your example above you did just that but this needs to be made clear) 2. I would never recommend setting nifi.remote.input.socket.host= to "localhost". When a NiFi or MiNiFi connects to another NiFi via S2S, the destination NiFi will return the value set for this property along with the value set for nifi.remote.input.socket.port=. In your example that means the source MiNiFi would then try to send FlowFiles to localhost:10000. This is ONLY going to work if the destination NIFi is located on the same server as MiNiFi. 3. You should also explain why you are changing nifi.remote.input.secure= from true to false. Changing this is not a requirement of MiNiFi, it is simply a matter of preference (If set to true, both MiNiFi (source) and NiFi (destination) must be setup to run securely over https). In your example you are working with http only. 4. While doable, one should never route the "success" relationship from any processor back on to itself. If you have reached the end of your dataflow, you should auto-terminate the "success" relationship. 5. I am not clear what you are telling me to do based on this line under step 5:
Start the From MiNiFi Input Port 6. When using the GenerateFlowFile processor in an example flow it is important to recommend that user set a run schedule other then "0 sec". Since MiNiFi is Apache 0.6.1 based there is no default backpressure on connections and with a run schedule of "0 sec" it is very likely this processor will produce FlowFiles much faster then they can be sent across S2S. This will eventual fill the hard drive of the system running MiNiFi. An even better recommendation would be to make sure they set back pressure between the GenerateFlowFile processor and the Remote Process Group (RPG). That way even if someone stops the NiFi and not the MiNiFi they don't fill their MiNiFI hard drive. Thanks, Matt
... View more
10-26-2016
02:17 PM
Is user2@domain.net part of your "Admin NiFi" user group?
Did you grant "Admin Group" the "modify the data" policy? You can set DEBUG in you logback.xml file for the following line to get more output in your nifi-users.log: <logger name="org.apache.nifi.web.api.config" level="INFO" additivity="false"> No nifi restarts are needed for any changes to the logback.xml file to take affect. Matt
... View more
10-26-2016
12:58 PM
@Paul Yang What you have here is very light data flow based on the picture shown. The NiFi RPG will send data in batches of up to 100 for efficiency. So if the input queue has less then 100 files in it when it runs, all of those FlowFile will be routed to a single Node. On next run the next batch would go to a different node. Over time if the dataflow rate is constant, the data should be balanced across your nodes. If i am understanding what you have here, you are feeding the RPG that feeds an input port. That input port feeds an output port. Then you can use various RPGs anywhere in your flow to pull data from that output port. correct? The problem with this is that the RPG runs on every Node. so when a node connects he will try to pull all the files he sees on that connection. Nodes are not aware of how many nodes exist in its cluster and will not say I should only pull x amount so the other nodes can pull the same. Each node acts in a a vacuum and pulls as much data as fast as it can from the output port. I would suggest instead having your remote input port (root level input port) feed its success relationship multiple times in the various sub process groups owned by your various departments. Not only will this provide a better load-balanced delivery of data in the cluster, but it will also improve performance. Thanks, Matt
... View more