Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

NiFi - fetchHDFS without ListHDFS

NiFi - fetchHDFS without ListHDFS

New Contributor

Hello, i need use fetchHDF processor in the middle of the flow. I use first updateAttribute before fetchHDFS to create variable path and filename, but my problem is that i need to take all the files in a folder and i don't know how do it.

92827-updateattribute.jpg

updateattribute.jpg

5 REPLIES 5

Re: NiFi - fetchHDFS without ListHDFS

Super Guru

@Pepelu Rico

Use GetHDFSFileInfo processor and configure the Full path property value as `<directory>` and this processor is stateless so you are going to list out all the files from the directory.

GetHDFSFileInfo Configs:

92839-info.png

we are listing out all files in /tmp directory recursively and configured Destination as Attributes, so the flowfiles will have all the write attributes as flowfile attributes.

92840-wa.png

This processor has been added in NiFi-1.7, if you are using earlier version of NiFi then you need to run a script that can list out all the files in the directory then extract the path and use the extracted attribute in FetchHDFS processor.

Re: NiFi - fetchHDFS without ListHDFS

Super Guru

@Pepelu Rico

Please check my `updated answer` and we don't need to run the command in the processor as this processor designed to just configure the directory and all the commands will run by the processor it self.

Re: NiFi - fetchHDFS without ListHDFS

Super Guru
@Pepelu Rico

Could you once make sure the scheduling of GetHDFSFileInfo processor by default this processor scheduled to run 0 sec(always running), I think that is causing this 10000 flowfiles.

GetHDFSFileInfo processor doesn't store the state so it will always list out the files in the directory.

Change the Run schedule like (1 hr) then this processor will run once per hour and you will get only the number of files in directory.

Highlighted

Re: NiFi - fetchHDFS without ListHDFS

New Contributor

I don't know how use this command in GetHDFSFileInfo processor and after the fetchHDFS processor. Sorry

Re: NiFi - fetchHDFS without ListHDFS

New Contributor

@Shu

The problem now is when i try use this processor, the fileflow incrase up to 10,000 fileflows but in the folder there is only 46 files, i don't know if it is a problem or no.

How should I configure after the FetchHDFS processor?

92880-gethdfsfileinfo.jpg

Thank you