Member since
07-30-2019
3391
Posts
1618
Kudos Received
1000
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 277 | 11-05-2025 11:01 AM | |
| 163 | 11-05-2025 08:01 AM | |
| 496 | 10-20-2025 06:29 AM | |
| 636 | 10-10-2025 08:03 AM | |
| 403 | 10-08-2025 10:52 AM |
06-13-2017
04:31 PM
@forest lin NiFi at is core has no issues working with very large files. Often times, when you run into OOM it is because of what you are trying to do with those very large files after they are in NiFi. In the majority of the cases OOM can be avoided via dataflow design and tweaks to the heap size allocated to the NiFi JVM. The content of a FlowFile does not live in heap memory space, but the FlowFile attributes do (*** except when swapped out to disk in large queues). So avoid extracting large amounts of the content into FlowFile attributes, avoid trying to split very large files in to large numbers of small FlowFiles using a single processor, avoid trying to merge a very large number of FlowFiles in to a single FlowFile, etc... You can still do these types of things but may need to do it in two stages rather then one. For example Splitting large files by every 5000 lines first and then split 5000 line FlowFiles by every line (Huge difference in heap usage). If you found this answer addressed your question, please mark it as accepted to close out this thread. Thanks,
Matt
... View more
06-13-2017
03:21 PM
@Oleksandr Solomko You can see where these files are queued via the "summary" UI: Once the Summary UI opens, select the "CONNECTIONS" tab. You can sort on any column by clicking that column. Once you have found the row for your queued connection, click on the "view connection details icon ( )on the far right side of the row. This will pop open a new UI that shows queue breakdown per node in cluster. This will help you identify if you are having a cluster wide issue here or it is localized to one specific node. If it is just one node with all this queued data, you could manually disconnect this node from your cluster. Then go directly to the URL for that disconnected node. See if you can empty the queue then. Check for ERROR or WARN logs specifically in that nodes nifi-app.log, nifi-user.log, and nifi-bootstrap.log. What OS and Java version are you running also? Thanks, Matt
... View more
06-13-2017
12:49 PM
1 Kudo
@forest lin Backpressure is not used to control data rate in your dataflow. The intent of the backpressure setting on connections is to control the amount of allowed queued data. Both Back pressure settings are "soft" limits. Once backpressure kicks in on a connection, the processor feeding that connection will no longer be allowed to run. So in you case above, you have backpressure set to 5 Objects (FlowFiles) or 5 KB of content. Since your queue is empty, no backpressure was being applied when the 37.05 MB FlowFile arrived at your ConvertCSVToAvro processor, so that processor was allowed to run. That 1 FlowFile was processed through and placed on the outbound connection. It is at that time back pressure kicked in because you exceeded one of your backpressure settings. The ConvertCSVToAvro processor will now be prevented from running until that backpressure drops below 5 FlowFiles or 5 KB of queued data again. If all your processor are processing FlowFiles rapidly, back pressure will be very sparsely applied. Also keep in mind for efficiency some processors work on batches of FlowFiles. You may see for example with a backpressure object threshold of 5 a queue with more then 5 FlowFiles. The batch of FlowFiles are placed on an outbound queue. That processor who did the batch processing will then not be allowed to run again until that outbound connection drops again below 5 FlowFiles. The ControlRate processor allows you to actually control the throughput of a dataflow. It does not slow the processing. The ControlRate processor will allow data to queue in its input side and based on its configured setting only allow x number of FlowFiles through over y amount of time. lets say it is configured to let 5 KB of data through every 1 minute. If you feed it a 37 MB file, it does not transfer just pieces of that FlowFile. It will feed through the entire 37 MB FlowFile and then not allow another FlowFile through until the average data per 1 minute is 5 KB. Because of how the above works, data could continue to queue in front of ControlRate. This is where backpressure settings become important to stop upstream processor from running. You can set backpressure all the way upstream to your data ingest processors so they stop accepting new FlowFiles. Thanks, Matt
... View more
06-12-2017
02:12 PM
@Justin R. Is this a NiFi cluster installation with multiple nodes running on the same host? If that is the case, which ever node manages to bind to the port first wins, all other nodes on same host will report that port is already in use. Matt
... View more
06-12-2017
01:18 PM
@Ahmad Mehr When you start NiFi, the UI does not become available until the application has completed loading. /bin/nifi.sh status The above command simply shows that the application is running, but does not indicate the UI is available yet. To verify that NiFi has completed the startup process and the UI is now available, you will need to look in the nifi-app.log for the following lines: 2017-06-12 09:16:16,029 INFO [main] org.apache.nifi.web.server.JettyServer NiFi has started. The UI is available at the following URLs:
2017-06-12 09:16:16,029 INFO [main] org.apache.nifi.web.server.JettyServer http://<HOSTNAME>:8075/nifi
2017-06-12 09:16:16,031 INFO [main] org.apache.nifi.BootstrapListener Successfully initiated communication with Bootstrap
2017-06-12 09:16:16,031 INFO [main] org.apache.nifi.NiFi Controller initialization took 14617467433 nanoseconds. Until you see these log lines, the UI will not be accessible. You can also run the following linux command to see if "something" is listening on port 8075 yet: netstat -ant|grep LISTEN|grep 8075 Thank you, Matt
... View more
06-09-2017
03:51 PM
3 Kudos
@Eric Lloyd Input and Output ports are designed to send or receive data from one level up. When an input or output port is added at the root canvas level the one level up is another out of the system. You will also notice that ports added to the root canvas are rendered a little differently. There is an open Apache Jira on this subject, feel free to add your comments and use case to it: https://issues.apache.org/jira/browse/NIFI-2933 The current feeling is that adding Remote input and output ports should be left to the system administrator. This is because in a secured connection the admin must add the connecting systems as new users and authorize them to access these ports. Users are not typically granted this level of access. Thanks, Matt
... View more
06-08-2017
04:08 PM
@Daniel Frank If you use @Matt Clarke in your response, I do not get an email notification. I am not following how you use the filename and path to file (B) to parse a totally different file (C) from the filesystem. Have you looked at the FetchFile processor. It accepts a FlowFile as input and uses attributes set on the incoming FlowFile to specify what file to fetch and from where. So you could getFile (B), extract what you need from file (B) into attributes that FetchFile can use to get File (C). FetchFile will stream the content of file (C) into the FlowFile originally belonging to File (B); however, the resulting FlowFile will retain all the FlowFile Attributes that already existed on FlowFile (B). Thanks, Matt If you found this answer addressed your question, please mark as accepted to close out this thread in the community.
... View more
06-08-2017
02:18 PM
@Daniel Frank What format is your data in? (text?) Is all the information you need in the content of these files? The getFile processor already writes attributes for the following on every FlowFile it creates: You could use the ExtractText processor to read the FlowFile content and extract bits to FlowFile Attributes. Thanks, Matt
... View more
06-08-2017
02:05 PM
@Anthony Murphy NiFi is designed to be resilient. It is designed to restore processor to last known state on startup (That state may be enabled, disabled, started, or stopped.) Are you sure these component processors where not stopped before the abrupt shutdown/restart of the server occurred? This is odd since you say it only happens occasionally. And I will be honest, this is the first time i have heard this issue. Is it always the same processors that fail to start? Are the processors that fail to start configured to use any NiFi Controller services? if so, are those Controller Services failing to start also? Check the nifi-app.log during startup to see if their were any logged ERROR or WARN messages related to these processor or controller services on startup. Thanks, Matt
... View more
06-08-2017
12:50 PM
1 Kudo
@Mahmoud Shash There was a bug identified in the Controller service UI of HDF 2.1.3. This bug affected users ability to modify, enable, disable and delete controller services. The HDF 2.1.3 release was pulled down. This bug was addresses in HDF 2.1.4. If you upgrade to HDF 2.1.4 you will be able to successfully access the Controller services in the CS UI. Thanks, Matt
... View more