Support Questions

jonay__reyes · ‎04-25-2022

Hi. I haven't found any documentation for these internals about the InvokeHTTP processor in NiFi. My requirement is to throttle requests to and endpoint (which we also control) that will accept, let's say, 5 simultaneous connections in an address and port, but I need to wait for each of their responses before letting another flowfile to be sent, so:

Does this translate to simply using 1 InvokeHTTP processor configured to 5 "Concurrent Tasks" and that's it?
Will the processor wait for the remote endpoint's request before sending the next one?
How does the "Run Schedule" works together with the previous settings? (if I had, e.g.: 1 sec)
I've been proposed with splitting the incoming queue and put 5 InvokeHTTP processors in parallel, each one attending 1/5 of the incoming flowfiles (I'd do the pre-partitioning before with some RouteOnAttribute trick), but I think it's exactly the same outcome as the 1. above. Is it?

This all boils down to not knowing exactly how this processor work under the hood. Any insight would be much appreciated, thanks in advance!

steven-matison · ‎04-26-2022

@jonay__reyes I think by default you will see the result you are expecting, however, the expected limit of 5 concurrent connections may be a challenge. Let's address your questions first:

Does this translate to simply using 1 InvokeHTTP processor configured to 5 "Concurrent Tasks" and that's it? - 1 proc w/ 5 concurrent tasks, will provide what is in effect 5 instance copies and they can run more than 5 requests each if there are ample flowfiles queued up. So, NO. For your use case, i would recommend that you set it to 1, and control the # of flowfiles upstream.
Will the processor wait for the remote endpoint's request before sending the next one? YES if concurrent task set to 1. NO, if set higher (2+) they will execute in parallel
How does the "Run Schedule" works together with the previous settings? (if I had, e.g.: 1 sec). Run Schedule sets how long a process will operate before a new instance is necessary. If the request/response times are low, this setting will allow you to push more data through each instance without creating separate processes for each. If the request/response time is high, you can use this to help with long execution. Experiment carefully here.
I've been proposed with splitting the incoming queue and put 5 InvokeHTTP processors in parallel, each one attending 1/5 of the incoming flowfiles (I'd do the pre-partitioning before with some RouteOnAttribute trick), but I think it's exactly the same outcome as the 1. above. Is it? Correct, there is no reason to do this, avoid duplicating processors

For concurrent tasks and run schedule adjustments, you should always experiment in small increments, changing one setting at a time, evaluating, and repeating until you find the right balance. I suspect that you will not need 5 long executing request/responses in parallel, and that even with default settings, your queued flowfiles will execute fast enough to appear "simultaneous".

View solution in original post

steven-matison · ‎04-26-2022

@jonay__reyes I think by default you will see the result you are expecting, however, the expected limit of 5 concurrent connections may be a challenge. Let's address your questions first:

Does this translate to simply using 1 InvokeHTTP processor configured to 5 "Concurrent Tasks" and that's it? - 1 proc w/ 5 concurrent tasks, will provide what is in effect 5 instance copies and they can run more than 5 requests each if there are ample flowfiles queued up. So, NO. For your use case, i would recommend that you set it to 1, and control the # of flowfiles upstream.
Will the processor wait for the remote endpoint's request before sending the next one? YES if concurrent task set to 1. NO, if set higher (2+) they will execute in parallel
How does the "Run Schedule" works together with the previous settings? (if I had, e.g.: 1 sec). Run Schedule sets how long a process will operate before a new instance is necessary. If the request/response times are low, this setting will allow you to push more data through each instance without creating separate processes for each. If the request/response time is high, you can use this to help with long execution. Experiment carefully here.
I've been proposed with splitting the incoming queue and put 5 InvokeHTTP processors in parallel, each one attending 1/5 of the incoming flowfiles (I'd do the pre-partitioning before with some RouteOnAttribute trick), but I think it's exactly the same outcome as the 1. above. Is it? Correct, there is no reason to do this, avoid duplicating processors

For concurrent tasks and run schedule adjustments, you should always experiment in small increments, changing one setting at a time, evaluating, and repeating until you find the right balance. I suspect that you will not need 5 long executing request/responses in parallel, and that even with default settings, your queued flowfiles will execute fast enough to appear "simultaneous".

jonay__reyes · ‎04-26-2022

Thanks for your replies @steven-matison!!! great help indeed.

Anyway, a little more detail on the 3rd point please: Run Schedule sets how long a process will operate before a new instance is necessary

I have this setting in other processors for marking the frequency of "hey, start working!" for each processor. In this InvokeHTTP case, having the specific configuration of "5 concurrent tasks" and "run schedule 0.2 sec", my doubt is: will it wait 0.2s between request and request, on each thread?

Since you just explained that >1 task means "don't even wait", I guess that this setting will make the processor to send away 5 requests every 0.2s, so 25 requests per second, without even caring about if the remote server replied or not. Is this the case?

Otherwise, if concurrent tasks were set to 1, then the processor would wait for a response, THEN wait 0.2s, and THEN send away the next one?

My confusion comes from your "how long a process will operate" versus "how long will it wait after processing the previous flowfile.

Thanks again!

steven-matison · ‎04-28-2022

Do not think of the existence number of processors (concurrency) and the run schedule for that process as relating to request/response timing. The request/response time could be almost instant, to as long as your other end takes to respond specifically in reference to InvokeHttp. The number of processors (concurrency) is used to help gain a higher number of unique instances running against that proccessor maybe and usallly to help drain a huge queue of flowfiles (1000s,10000s,1000000s,etc). Run schedule is how long that one instance stays active (able to process more than 1 flowfile in sequence).

Hope this helps,

Steven

softgeek · ‎01-12-2023

Hi I have the same problem. I want to process 5 flow files at a time. Send the next one only if 1/5 gets a response using invokeHttp, can someone send config?

jonay__reyes · ‎01-13-2023

Even though you could send 5 at a time, you cannot wait for any of them (e.g., sequentilally) for allowing the next batch to be sent, at least that I know, using only this processor. I'd play with the idea of routing any response of this processor (retry, fail, success?) to a RouteOnAttribute processor that evaluates a flag for governing the InvokeHTTP, or better yet, use the Wait/Notify processors as explained in https://community.cloudera.com/t5/Support-Questions/Retrieve-Value-of-Signal-Counter-in-Wait-Notify-...

Cloudera Community

Support Questions

InvokeHTTP processor in Nifi, 1 thread: does it wait for response?

Replace ConsumeASB Processor with the InvokeHttp P...

Flowfile stuck in Wait in EnforceOrder processor

Implementing Invokehttp Processor In Nifi

Terminated processor thread is not disappearing.

Nifi 2.0M1 InvokeHttp processor import from regist...

Interrupt a running thread from NiFi UI

NiFi InvokeHttp processor with self-signed endpoin...

Nifi processor missing property

News Authors Personality Detection - Part 1: Creat...

NiFi/HDF Dataflow Optimization (Part 1 of 2)