Created on 10-15-2018 08:56 AM - edited 08-17-2019 08:55 PM
Hello, i need use fetchHDF processor in the middle of the flow. I use first updateAttribute before fetchHDFS to create variable path and filename, but my problem is that i need to take all the files in a folder and i don't know how do it.
Created on 10-15-2018 12:16 PM - edited 08-17-2019 08:55 PM
Use GetHDFSFileInfo processor and configure the Full path property value as `<directory>` and this processor is stateless so you are going to list out all the files from the directory.
GetHDFSFileInfo Configs:
we are listing out all files in /tmp directory recursively and configured Destination as Attributes, so the flowfiles will have all the write attributes as flowfile attributes.
This processor has been added in NiFi-1.7, if you are using earlier version of NiFi then you need to run a script that can list out all the files in the directory then extract the path and use the extracted attribute in FetchHDFS processor.
Created 10-15-2018 01:00 PM
Please check my `updated answer` and we don't need to run the command in the processor as this processor designed to just configure the directory and all the commands will run by the processor it self.
Created 10-16-2018 12:48 PM
Could you once make sure the scheduling of GetHDFSFileInfo processor by default this processor scheduled to run 0 sec(always running), I think that is causing this 10000 flowfiles.
GetHDFSFileInfo processor doesn't store the state so it will always list out the files in the directory.
Change the Run schedule like (1 hr) then this processor will run once per hour and you will get only the number of files in directory.
Created 10-15-2018 12:39 PM
I don't know how use this command in GetHDFSFileInfo processor and after the fetchHDFS processor. Sorry
Created on 10-16-2018 07:04 AM - edited 08-17-2019 08:55 PM
The problem now is when i try use this processor, the fileflow incrase up to 10,000 fileflows but in the folder there is only 46 files, i don't know if it is a problem or no.
How should I configure after the FetchHDFS processor?
Thank you
Created 02-09-2023 08:16 PM
If your problem processor generates the flow file more than your existing file then change the processor configuration - > Group Result from None to ALL
Created 02-13-2023 06:45 AM
@jricogar
Why not use the listHDFS processor?
It retains state so that same HDFS files do not get listed multiple times.
Just trying to understand your use case for using FetchHDFS without ListHDFS processor.
Thanks,
Matt