How can we overcome known bug related to file filtering on recursive listing in ListHDFS?
We are trying to filter out the specific files within subdirectories found under a recursive directory listing.
We have testing the bug in NiFi 1.5 and 1.7.
We are using ListHDFS to find any file that's been added under the parentdir directory tree and filter out for *.CSV.
There is a NiFi bug that won't allow filtering within subdirectories. Has anyone been able to do this?
Goal: get the list of all *.csv files under the parentdir directory.
As you are using `ListHDFS` processor and ListHDFS processor adds filename attribute to the flowfile.
Use RouteOnAttribute processor after ListHDFS processor and check the filename
Add new property as
Use Csvfiles relation to feed to FetchHDFS processor .
1.ListHDFS //list all the files recursively in the directories
2.RouteOnAttribute //filter out csv files
3.FetchHDFS //fetch the csv files from HDFS
By using this method we are fetching only the required csv files from HDFS directories and filtering out all the other format files in RouteOnAttribute processor.
If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.