Created 07-13-2017 08:38 AM
Hi All,
I want to fetch the data that is stored in HDFS using FetchHDFS processor .
The folder structure to store our data is like /MajorData/Location/Year/Month/Day/file1.txt (/MajorData/Location/2017/01/01/file1.txt) As the day changes the folder structure will change to /MajorData/Location/2017/01/02/file2.txt
How can I write a Nifi expression which will traverse through all the folders, fetch the data in NiFi?
Created 07-13-2017 12:42 PM
The ListHDFS processor records state so that only new files are listed. The processor also has a configuration option for recursing subdirectories. You could set the directory to only /MajorData/Location/ and let it list all files from the subdirectories. As new subdirectories are created, the files within those new directories will get listed.
If that does not work for you, the NiFi expression language (EL) statement that you are looking for would look something like this for the directory:
/MajorData/Location/${now():format('yyyy/MM/dd')}
The above would cause Nifi to only look in the target directory fro Files until the day changed. I am not sure the rate at which files are written in to these target directories, but be mindful that if a file is add between runs of the listHDFS processor and the day changes between those runs, that file will not get listed using the above EL statement.
Thanks,
Matt
Created 07-13-2017 12:42 PM
The ListHDFS processor records state so that only new files are listed. The processor also has a configuration option for recursing subdirectories. You could set the directory to only /MajorData/Location/ and let it list all files from the subdirectories. As new subdirectories are created, the files within those new directories will get listed.
If that does not work for you, the NiFi expression language (EL) statement that you are looking for would look something like this for the directory:
/MajorData/Location/${now():format('yyyy/MM/dd')}
The above would cause Nifi to only look in the target directory fro Files until the day changed. I am not sure the rate at which files are written in to these target directories, but be mindful that if a file is add between runs of the listHDFS processor and the day changes between those runs, that file will not get listed using the above EL statement.
Thanks,
Matt
Created 07-16-2017 06:36 PM
Thank you Matt, ListHDFS was a good hint. I was able to accomplish my task with you inputs.