Member since
07-30-2019
3427
Posts
1632
Kudos Received
1011
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 73 | 01-27-2026 12:46 PM | |
| 482 | 01-13-2026 11:14 AM | |
| 988 | 01-09-2026 06:58 AM | |
| 900 | 12-17-2025 05:55 AM | |
| 961 | 12-15-2025 01:29 PM |
06-14-2017
12:30 PM
3 Kudos
@Thierry Vernhet The ListFile processor will list all non-hidden file it sees in the target directory. It then will record the latest timestamp of batch of files it listed in state management. This timestamp is what is used to determine what new files to list in next run. Since the timestamp has changed, the same file will be listed again. A few suggestion in preferred order would be: 1. Change how files are being written to this directory. - The ListFile processor will ignore and hidden files. So File being written as ".myfile.txt" will be ignored until the filename has changed to just "myfile.txt". 2. Change the "Minimum File Age" setting on the processor to a high enough value to allows source system to complete file writes to this directory. 3. Add a detectDuplicate processor after your listFile processor to detect duplicate listed files and remove them from the your dataflow before the FetchFile processor. Thanks, Matt
... View more
06-14-2017
12:09 PM
@estefania rabadan There is no processor configuration option to turn off what attributes a processor writes on to a FlowFile it processes. However, you can use the UpdateAttribute processor to remove attributes from FlowFiles. Thanks, Matt
... View more
06-13-2017
09:25 PM
4 Kudos
@Prakash Ravi You have 9 NiFi nodes all running a ConsumeKafka processor configured with 3 concurrent tasks. That totals 27 consumers. Does the Kafka topic you are consuming from have 27 partitions? There can only be one consumer per partition on a topic. If you have more consumers then partitions, some of those consumers will never get any data. This likely explains the load distribution you are seeing. Whenever a new consumer is added or existing consumer is removed a rebalance is triggered. You will achieve your best performance when the number of partitions equals the number of consumers. Thanks, Matt
... View more
06-13-2017
05:42 PM
@Anoop Shet Sorry for the late response, but i don't get pinged unless you tag me in your response. The ListSFTP processor retains state on Files that have been listed. My guess here is the state is preventing these new filter form returning anything. Try clearing the state and see if it then lists the files based on your new filter or add a new ListSFTP processor using that different file filter. You can right click on the processor and select "View state". In the state UI for this processor you will see a link to "Clear state". If you found my answer addressed your original question, please mark as accepted to close out thsi thread. Thanks, Matt
... View more
06-13-2017
04:52 PM
1 Kudo
@Mahmoud Shash HDF 2.1.3 is a bad release. You are running into the exact Controller Service UI bug that resulted in HDF 2.1.3 being pulled and replaced with HDF 2.1.4. You can upgrade you HDF 2.1.3 to HDF 2.1.4 to fix this issue. Then you will be able to enable, disable, configure, and delete the HiveConnectionPool Controller service. Matt
... View more
06-13-2017
04:36 PM
@Narasimma varman
Sorry for late response, but I don't get pinged unless you add a comment to my response or tag me in your new answer. The dynamic properties expect the the "value" to be a valid NiFI expression language (EL) statement. Otherwise it is treated as a literal value. So i expect what you are seeing is that exact string passed in the nested header or some kind of session rollback, etc... Also not sure how you are pulling data using a "POST" method? Shouldn't you be using "GET"? Thanks,
Matt
... View more
06-13-2017
04:31 PM
@forest lin NiFi at is core has no issues working with very large files. Often times, when you run into OOM it is because of what you are trying to do with those very large files after they are in NiFi. In the majority of the cases OOM can be avoided via dataflow design and tweaks to the heap size allocated to the NiFi JVM. The content of a FlowFile does not live in heap memory space, but the FlowFile attributes do (*** except when swapped out to disk in large queues). So avoid extracting large amounts of the content into FlowFile attributes, avoid trying to split very large files in to large numbers of small FlowFiles using a single processor, avoid trying to merge a very large number of FlowFiles in to a single FlowFile, etc... You can still do these types of things but may need to do it in two stages rather then one. For example Splitting large files by every 5000 lines first and then split 5000 line FlowFiles by every line (Huge difference in heap usage). If you found this answer addressed your question, please mark it as accepted to close out this thread. Thanks,
Matt
... View more
06-13-2017
03:21 PM
@Oleksandr Solomko You can see where these files are queued via the "summary" UI: Once the Summary UI opens, select the "CONNECTIONS" tab. You can sort on any column by clicking that column. Once you have found the row for your queued connection, click on the "view connection details icon ( )on the far right side of the row. This will pop open a new UI that shows queue breakdown per node in cluster. This will help you identify if you are having a cluster wide issue here or it is localized to one specific node. If it is just one node with all this queued data, you could manually disconnect this node from your cluster. Then go directly to the URL for that disconnected node. See if you can empty the queue then. Check for ERROR or WARN logs specifically in that nodes nifi-app.log, nifi-user.log, and nifi-bootstrap.log. What OS and Java version are you running also? Thanks, Matt
... View more
06-13-2017
12:49 PM
1 Kudo
@forest lin Backpressure is not used to control data rate in your dataflow. The intent of the backpressure setting on connections is to control the amount of allowed queued data. Both Back pressure settings are "soft" limits. Once backpressure kicks in on a connection, the processor feeding that connection will no longer be allowed to run. So in you case above, you have backpressure set to 5 Objects (FlowFiles) or 5 KB of content. Since your queue is empty, no backpressure was being applied when the 37.05 MB FlowFile arrived at your ConvertCSVToAvro processor, so that processor was allowed to run. That 1 FlowFile was processed through and placed on the outbound connection. It is at that time back pressure kicked in because you exceeded one of your backpressure settings. The ConvertCSVToAvro processor will now be prevented from running until that backpressure drops below 5 FlowFiles or 5 KB of queued data again. If all your processor are processing FlowFiles rapidly, back pressure will be very sparsely applied. Also keep in mind for efficiency some processors work on batches of FlowFiles. You may see for example with a backpressure object threshold of 5 a queue with more then 5 FlowFiles. The batch of FlowFiles are placed on an outbound queue. That processor who did the batch processing will then not be allowed to run again until that outbound connection drops again below 5 FlowFiles. The ControlRate processor allows you to actually control the throughput of a dataflow. It does not slow the processing. The ControlRate processor will allow data to queue in its input side and based on its configured setting only allow x number of FlowFiles through over y amount of time. lets say it is configured to let 5 KB of data through every 1 minute. If you feed it a 37 MB file, it does not transfer just pieces of that FlowFile. It will feed through the entire 37 MB FlowFile and then not allow another FlowFile through until the average data per 1 minute is 5 KB. Because of how the above works, data could continue to queue in front of ControlRate. This is where backpressure settings become important to stop upstream processor from running. You can set backpressure all the way upstream to your data ingest processors so they stop accepting new FlowFiles. Thanks, Matt
... View more
06-12-2017
02:12 PM
@Justin R. Is this a NiFi cluster installation with multiple nodes running on the same host? If that is the case, which ever node manages to bind to the port first wins, all other nodes on same host will report that port is already in use. Matt
... View more