Support Questions

Find answers, ask questions, and share your expertise

NiFi 1.4 queue shows millions of files and 0 MB

avatar
Rising Star

Hi everyone,

I am facing a problem during the last days with a NiFi flow using HDFS List and Fetch processors.

The queue between them shows more than one million flow files and a total of 0 MB size.

This is very confusing. If I tried to see one of the files I am able to list them and if I click on the info bottom I can confirm the file size, but it seems to be empty. Back pressure is set to 100K, therefore I could not understand the number of files.

I tried restarting NiFi and dropping the files but the problem returns again.

Attached a screenshot of part of the flow.Any idea would be appreciated.

77418-nifi-non-empty-queue.jpg

Best regards,

Paul

1 ACCEPTED SOLUTION

avatar
Master Guru

ListHDFS emits empty (0-byte) flow files that have attributes (such as filename and path, see the doc for details) set on them. In this case FetchHDFS is running way more slowly than ListHDFS (it takes longer to retrieve the file than to list that it's there), which is why you get the backup. Also setting Max Size as a backpressure trigger won't work here since they are 0-byte files. Try setting Max number of Objects for backpressure instead.

View solution in original post

4 REPLIES 4

avatar
Master Guru

ListHDFS emits empty (0-byte) flow files that have attributes (such as filename and path, see the doc for details) set on them. In this case FetchHDFS is running way more slowly than ListHDFS (it takes longer to retrieve the file than to list that it's there), which is why you get the backup. Also setting Max Size as a backpressure trigger won't work here since they are 0-byte files. Try setting Max number of Objects for backpressure instead.

avatar
Master Mentor

@Paul Hernandez

Just to add to the above correct response...
The backpressure threshold settings for both size and number of FlowFiles are soft limits. When a processor is eligible to execute/run, it will run that thread to completion. The ListHDFS processor for example will list all FlowFiles newer then the last execution/run recorded state. Even if "Back Pressure Object Threshold" is set to 10000, it will not stop the listHDFS processor from listing 1,000,000 flowfiles in a single execution. Once those 1,000,000 FlowFiles are placed on connection back pressure starts being applied. The listHDFS processor will not be eligible to execute/run again until that threshold drop back below the threshold setting of 10,000.

-

Back pressure Data Size Threshold" works in a similar manor. Size in NiFi is always a measure of the size of the content associated to a FlowFile and not the actual size of a FlowFile.

-

Thanks,

Matt

avatar
Super Collaborator

@Matt Burgess

But ListHDFS will keep the state and only supposed to pull the changed files.right.??

@Paul Hernandez what are the properties of your ListHDFS.?

avatar
Rising Star

Hi guys,

thanks so much for the fast support and thanks to the Matts Team @Matt Burgess and @Matt Clarke

I finally understood how the processor works. He emits a flow file with no payload and in the meta attributes are the file details like path and filename. Those are used by the HDFSFetch to fetch the correspondent files.

Kind regards,

Paul