Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

ListFiles/Fetchfiles in a dynamic folder structure based on dates using NIFI

New Contributor

Hello All,

I am creating a flow in NIFI Listing files in a folder structure as follow:

 

..2019/january/

....

../2019/november/

../2019/december/

 

Each folder stores files creating during the year/month included in its path. files from 2019 november are in 2019/november and files from 2019 december are in /2019/december/

 

How could I use Listfiles for getting all files from folders without scanning the full structure?

I tried including now: function in ListFIles input directory for creating the path and it works for the month in progress  but I would like to understand what it will happen the first day of the next month.

 

Thanks for helping me on this.

 

1 REPLY 1

Super Collaborator

Im not sure if this is what you want, but with the combination of these documented features you could get quite creative in which folders you would want to listen to.

 

Input Directory  The input directory from which files to pull files
Supports Expression Language: true
Recurse Subdirectoriestrue
  • true
  • false
Indicates whether to list files from subdirectories of the directory
Input Directory LocationLocal
  • Local Input Directory is located on a local disk. State will be stored locally on each node in the cluster.
  • Remote Input Directory is located on a remote system. State will be stored across the cluster so that the listing can be performed on Primary Node Only and another node can pick up where the last node left off, if the Primary Node changes
Specifies where the Input Directory is located. This is used to determine whether state should be stored locally or across the cluster.
File Filter[^\.].* Only files whose names match the given regular expression will be picked up
Path Filter  When Recurse Subdirectories is true, then only subdirectories whose path matches the given regular expression will be scanned

 

Honestly, if you just want to skip past months, it may be easier to just create one processor for this month, and already set up a new processor to listen to files from next year(s).


- Dennis Jaheruddin

If this answer helped, please mark it as 'solved' and/or if it is valuable for future readers please apply 'kudos'. Also check out my techincal portfolio at https://portfolio.jaheruddin.nl