Support Questions

samrathal · ‎10-18-2023

I need to retrieve all flowfiles in a specific queue in NiFi, but the API only returns up to 100 results at a time. My queue contains 358 flowfiles, so I need a way to retrieve all of them

Below are the API am using for get the flowfiles.

API : https://myserver.com:myport/nifi-api/flowfile-queues/ba619122-3c65-3279-a2ba-b3ad89f8a429/listing-re...

{
"listingRequest": {
"id": "903d485f-d4aa-102a-0000-0112e4a15ea",
"uri": "https://myserver.com:myport/nifi-api/flowfile-queues/ba619122-3c65-3279-a2ba-b3ad89f8a429/listing-re...",
"submissionTime": "10/18/2023 17:07:57.632 IST",
"lastUpdated": "17:07:57 IST",
"percentCompleted": 1,
"finished": true,
"maxResults": 100,
"state": "Completed successfully",
"queueSize": {
"byteCount": 3792,
"objectCount": 350
},
"flowFileSummaries": [
{
"HERE IS THE FLOW FILES TILL 100TH POSITIONS": 100
}
]
}
}

MattWho · ‎11-02-2023

@samrathal
Apache NiFi has hardcoded return size to 100:
https://github.com/apache/nifi/blob/main/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-...

I am sure this was originally done for performance and NiFi JVM heap usage reasons.
The first 100 returned should be the oldest 100 in queue (keeping in mind that a connection will also show count of FlowFiles pending processing by downstream processor and count of those currently allocated to a downstream component process. The listing only returns those pending FlowFiles and not those already owned by downstream component).What is the use case for needing to list more? Ideally what is found in a queue should be changing rapidly, so expectation is that each listing request would be different. Listing a queue does not stop NiFi processing. The intent is not for NiFi to ever hold FlowFiles in any connection. So using API to poll connection for FlowFile listings seems odd to me. What is returned by that listing could be inaccurate milliseconds later.

Also be careful with your API requests. When a listing is performed through the browser three different request are made.

1. First listing-request is made and replicated to all nodes to get result sets.
2. Return from step 1 request gives the ID for the generated listing request being held in heap memory. That ID is used to fetch the results in that specific listing ID
3. A DELETE request is made to remove the listing with that ID from NiFi.

*** When using API, If steps 1 and 2 are all that are being executed, the various listing request(s) will stay in heap memory.

If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped.

Thank you,
Matt

Cloudera Community

Support Questions

How to retrieve a complete list of flowfiles in a specific queue in NiFi using the API or UI