I'm very new to NIFI. I'm just wondering how can i start one processor group from other processor group automatically. Let's say I have two processor groups PG1, PG2. Based on the output from PG1, the processor group PG2 should start automatically. Is it possible to do that? If so what are all required?
At a high level, anything you can do via the NiFi UI can also be done via rest-api calls.
So while no processor exists specifically for the purpose of triggering another component to enable and start, you could use the invokeHTTP processor(s) to make the necessary rest-api calls to start other components.
Getting the rest-api calls needed can be done using developer tools available in most browsers. You can capture the rest api calls as you perform the actions within the NiFi UI. Most developer tools allow you to save as curl for example which gives you the exact rest-api endpoint called, all the headers, and other relevant input for the rest-api endpoint (often json).
NiFi dataflow architecture is typically designed as an always on setup.
You have not shared much about your use case as to why you need to start PG2 only when PG1 has a specific outcome. Maybe there are alternate design solutions here.
Hope this helps,
@MattWho Let's say my PG1 is having GetFile processor and an output port. My PG2 is having input port and PutFile processor. You need to start PG1 manually. If there are files goes through GetFile then automatically PG2 should start. If there are no files for the GetFile, PG2 should remain in off state. This is what I'm thinking whether can I able to do this?
Normally a flow like this would always be running. The processors being feed by the input port in PG2 are not going to do any work until they receive that upstream FlowFile that is coming from PG1. Processors are designed to yield when there is not work to do to avoid excessive CPU usage.
Do you have concerns with how this works?
NiFi processor components are configured to execute based on a run schedule. There are two schedule driven strategies available (Cron Driven and Timer Driven).
The Cron Driven scheduling strategy uses a user configured Quartz Cron to set how often the processor will execute. The Timer Driven scheduling strategy (most common strategy used) uses a user configured run schedule (default run schedule is 0 secs, which means run as often as system will allow).
When a processor executes based on the configured scheduling strategy, it will do one of two things:
1. If the processor has one or more inbound connections, it will check if any of them have any queued FlowFiles. If none of the connections contain any queued FlowFiles, the processor will yield. The yield is intended to keep the processors with run schedule of 0 secs from simply constantly requesting CPU threads to check empty inbound connection queues. No matter the run schedule, a yielded processor will not execute until the yield has expired reducing CPU usage by that processor.
2. Some processor have no inbound connections. These processors will not yield, but continuously execute on the configured run schedule. You would not have any such processors in your PG2 since they will have upstream connections to components in PG1. So for "source" type processors like listSFTP, ListFile, GenerateFlowFIle, or any other processor that does not support an inbound/upstream connection, if the feed of data is not continuous, it is best to use the Cron Driven scheduling strategy or set a Timer Driven run schedule that is not the default 0 secs to reduce CPU usage.
On the face of every processor is a state for Tasks/Time. The stat tells you how many threads reported as completed in the past 5 minutes and how much cumulative CPU time was used by all those completed threads. This allows you to see the impact a given processor is having on your CPU.
Hope this helps explain cpu usage for you,