Hello, i need use fetchHDF processor in the middle of the flow. I use first updateAttribute before fetchHDFS to create variable path and filename, but my problem is that i need to take all the files in a folder and i don't know how do it.
Use GetHDFSFileInfo processor and configure the Full path property value as `<directory>` and this processor is stateless so you are going to list out all the files from the directory.
we are listing out all files in /tmp directory recursively and configured Destination as Attributes, so the flowfiles will have all the write attributes as flowfile attributes.
This processor has been added in NiFi-1.7, if you are using earlier version of NiFi then you need to run a script that can list out all the files in the directory then extract the path and use the extracted attribute in FetchHDFS processor.
Please check my `updated answer` and we don't need to run the command in the processor as this processor designed to just configure the directory and all the commands will run by the processor it self.
Could you once make sure the scheduling of GetHDFSFileInfo processor by default this processor scheduled to run 0 sec(always running), I think that is causing this 10000 flowfiles.
GetHDFSFileInfo processor doesn't store the state so it will always list out the files in the directory.
Change the Run schedule like (1 hr) then this processor will run once per hour and you will get only the number of files in directory.
The problem now is when i try use this processor, the fileflow incrase up to 10,000 fileflows but in the folder there is only 46 files, i don't know if it is a problem or no.
How should I configure after the FetchHDFS processor?