Member since
07-30-2019
3406
Posts
1621
Kudos Received
1006
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 21 | 12-17-2025 05:55 AM | |
| 82 | 12-15-2025 01:29 PM | |
| 42 | 12-15-2025 06:50 AM | |
| 198 | 12-05-2025 08:25 AM | |
| 338 | 12-03-2025 10:21 AM |
05-25-2017
06:51 PM
1 Kudo
@Anil Reddy Keep in mind that the setting you apply are per node and not per cluster. So setting your concurrent task to 100 means that this processor on every node in you cluster has the ability to request up to 100 concurrent threads. Does each node in your NIFi cluster have that many CPU cores? The concurrent tasks assigned to your processors pull threads from the "Maximum Timer Driven Thread Count" pool. As you can see the default resource pool is rather low. So even if you do have the cores, If NiFi is not allowed to use them, this processor will never see 100 per node. Keep in mind that these setting are also per node. So for example: If you have a server with 16 cores, this setting should be set somewhere between 32 and 64. so your "concurrent tasks" set on a processor should not be set higher then this. Also keep in mind that if a processor does consume all available threads from the pool, none of your other processor will be able to run until that processor releases a thread back to the pool. Thanks, Matt
... View more
05-25-2017
05:47 PM
@Anoop Shet So the ListSFTP and FetchSFTP processor will be your best choice here. You can still use the Java regular expression I provided to filter only on files ending in years 2017 - 2099. Why this is your best option is because the ListSFTP processor retains state. This means that when it runs today, it will list all files ending in 2017. It will then record state of the most current File listed in the form of the lastModified timestamp on that file. When the processor runs again it will only look to list any files with a newer timestamp then what was previously recorded in state management (while still applying regex). The listSFTP processor produces one 0 byte FlowFile for each file listed from the SFTP server. These 0 byte FlowFiles have numerous FlowFile attributes created on them with some of them being used by default by the FetchSFTP processor to actual retrieve the content from the SFP server and insert it into that FlowFile. As far as an alternative to EL, does the Java regular expression I provided not work for you? There are plenty of resource on the web for writing and even testing java regular expressions. Once data is ingested by NiFi as FlowFiles, you can use NiFi's EL to evaluate and route FlowFiles. Thank you, Matt
... View more
05-25-2017
05:19 PM
@Anoop Shet I am confused...
The "File Filter Regex" property in both the GetSFTP and ListSFTP processors work the same way and do not support NiFi expression language. The List type processors are used in conjunction with the corresponding Fetch processor to pull data. The FetchSFTP processor is designed to fetch the content of one File at a time and insert that data in to the FlowFile that triggered the Fetch processor to run. While you can certainly use listSFTP to fetch a listing of all Files on your SFTP server and then use a routeOnAttribute processor to only pick out those with the current year in it, the java regular expression i provided should work as well. Here is an article on GetSFTP vs list/fetchSFTP processors: https://community.hortonworks.com/articles/97773/how-to-retrieve-files-from-a-sftp-server-using-nif.html Thanks, Matt
... View more
05-25-2017
05:00 PM
@Anoop Shet The following Java Regular expression would match all years from 2017 to 2099: ^.*20([1-9]{1}[7-9]{1}|[2-9]{1}[0-9]{1})$ Thanks, Matt
... View more
05-25-2017
04:51 PM
@Anoop Shet That particular property (File Filter Regex) does not support NiFi Expression Language (EL).
If you float your cursor over the "?" displayed next to a processor property, you will see a line that says whether EL is supported or not. This particular property only supports Java regular Expressions as input. Thanks, Matt
... View more
05-25-2017
04:10 PM
@Max Evers There was a recent change to ListFile to change this exact same behavior. https://issues.apache.org/jira/browse/NIFI-3213 An apache Jira could be opened asking that the same change be adapted to listHDFS as well. Thanks, Matt
... View more
05-25-2017
02:55 PM
1 Kudo
@Max Evers Does the 1 file that is being left behind have the most recent timestamp of all files consumed? NiFi records state based on the timestamp of the most recent file listed. The problem that can occur is that if multiple files are being written in to the target location at the same time, they may not all make it into the listing being performed. So if NiFi recorded that timestamp in state, on next run those other files would not be listed and would never get fetched. So the idea is to list all files except those with the latest timestamp. In most cases, this is only 1 or 2 files not being listed. So what ends up being listed is all but any files with the same most current timestamp. This ensures that even when time differs between your NiFi server and target HDFS servers that all files get listed on next processor execution. Please let us know if you are seeing different behavior. Thanks, Matt
... View more
05-25-2017
12:46 PM
@Tinkle Mahendru How do you identify your files as containing CSV data without looking at each file's content? Does the filename indicate that it contains CSV data? Assuming all your csv files have a csv filename extension on them, you could use the routeOnAttribute processor to route on files whose filename ends in .csv to your mergeContent processor. All other FlowFiles with a filename not ending in .csv could then be routed elsewhere in your dataflow. You would add a new custom property as follows to the routeOnAttribute processor: Each added dynamic property becomes a new relationship for this processor. Lets say there is no extension, you may be able to use the RouteOnContent processor to look at the content of each FlowFile for an indicator that it is CSV data and route that way. Of course reading content versus evaluating attributes is more expensive operation in terms of resources. The MergeContent processor has virtual bins where it groups incoming FlowFiles before merging all the FlowFiles assigned to that bin. The Correlation Attribute property provides a way for you to control what FlowFile are put in which bin. FlowFiles are made up of FlowFile Attributes (key value pairs - basically metadata) and FlowFile content (your actual data). You can use various processors (ie. updateAttribute) to add and manipulate FlowFile attributes on a FlowFile. If you configure your MergeContent processor to use a correlation Attribute, NiFi will look for the attribute key you specify and bin files with the exact same value into the same virtual bin. I do not believe this is what you are looking for here to solve your use case. While there are scripting processor available in NiFi that can be used to execute your won script against a FlowFile, they are designed to operate against one FlowFile at a time. You could maybe use a putFile processor to write your CSV files to disk and the use one of the scripting processors to merge them. Another option to is to write your own custom NiFi processor that is specifically designed to merge CSV files. https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html If you feel we have successfully answered your question, please mark an answer as accepted. Thank you, Matt
... View more
05-25-2017
12:05 PM
2 Kudos
@J. D. Bacolod Anything you can do within the NiFi UI, you can also do via NiFi Rest-api calls. So you could issue a rest-api call to stop specific processors before the batch job is started and then issue another rest-api call to start the processor again after the batch job completes. https://nifi.apache.org/docs/nifi-docs/rest-api/index.html Thanks, Matt
... View more
05-24-2017
01:52 PM
@Bhushan Babar Glad i was able to help resolve your issue. Could you please click "accept" the answer i provided to close out this question in the community? Thank you,
Matt
... View more