Support Questions
Find answers, ask questions, and share your expertise

NiFi 1.4 queue shows millions of files and 0 MB

Solved Go to solution
Highlighted

NiFi 1.4 queue shows millions of files and 0 MB

Hi everyone,

I am facing a problem during the last days with a NiFi flow using HDFS List and Fetch processors.

The queue between them shows more than one million flow files and a total of 0 MB size.

This is very confusing. If I tried to see one of the files I am able to list them and if I click on the info bottom I can confirm the file size, but it seems to be empty. Back pressure is set to 100K, therefore I could not understand the number of files.

I tried restarting NiFi and dropping the files but the problem returns again.

Attached a screenshot of part of the flow.Any idea would be appreciated.

77418-nifi-non-empty-queue.jpg

Best regards,

Paul

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: NiFi 1.4 queue shows millions of files and 0 MB

Super Guru

ListHDFS emits empty (0-byte) flow files that have attributes (such as filename and path, see the doc for details) set on them. In this case FetchHDFS is running way more slowly than ListHDFS (it takes longer to retrieve the file than to list that it's there), which is why you get the backup. Also setting Max Size as a backpressure trigger won't work here since they are 0-byte files. Try setting Max number of Objects for backpressure instead.

View solution in original post

4 REPLIES 4
Highlighted

Re: NiFi 1.4 queue shows millions of files and 0 MB

Super Guru

ListHDFS emits empty (0-byte) flow files that have attributes (such as filename and path, see the doc for details) set on them. In this case FetchHDFS is running way more slowly than ListHDFS (it takes longer to retrieve the file than to list that it's there), which is why you get the backup. Also setting Max Size as a backpressure trigger won't work here since they are 0-byte files. Try setting Max number of Objects for backpressure instead.

View solution in original post

Highlighted

Re: NiFi 1.4 queue shows millions of files and 0 MB

Master Guru

@Paul Hernandez

Just to add to the above correct response...
The backpressure threshold settings for both size and number of FlowFiles are soft limits. When a processor is eligible to execute/run, it will run that thread to completion. The ListHDFS processor for example will list all FlowFiles newer then the last execution/run recorded state. Even if "Back Pressure Object Threshold" is set to 10000, it will not stop the listHDFS processor from listing 1,000,000 flowfiles in a single execution. Once those 1,000,000 FlowFiles are placed on connection back pressure starts being applied. The listHDFS processor will not be eligible to execute/run again until that threshold drop back below the threshold setting of 10,000.

-

Back pressure Data Size Threshold" works in a similar manor. Size in NiFi is always a measure of the size of the content associated to a FlowFile and not the actual size of a FlowFile.

-

Thanks,

Matt

Re: NiFi 1.4 queue shows millions of files and 0 MB

Super Collaborator

@Matt Burgess

But ListHDFS will keep the state and only supposed to pull the changed files.right.??

@Paul Hernandez what are the properties of your ListHDFS.?

Highlighted

Re: NiFi 1.4 queue shows millions of files and 0 MB

Hi guys,

thanks so much for the fast support and thanks to the Matts Team @Matt Burgess and @Matt Clarke

I finally understood how the processor works. He emits a flow file with no payload and in the meta attributes are the file details like path and filename. Those are used by the HDFSFetch to fetch the correspondent files.

Kind regards,

Paul