Created on 01-10-2017 07:39 PM - edited 08-19-2019 03:19 AM
Greetings,
I am not sure if I understand why you would create multiple input and output ports for a process group (PG). What purpose would the additional ports serve ?
I am thinking that if you want to "call" (or send/receive data to/from) the same PG from many different processors in NiFi, then you would use different ports for each processor, to avoid mixing data the PG is getting from the different processors. As an example (please see the image below) if the PG has a MergeContent processor, then I would not want to merge flow files that are coming from different Processors into the PG. Is that one (if any) of the reason for having multiple ports in a PG ?
Created 01-10-2017 08:00 PM
Process groups can be nested inside process groups and with the granular access controls NiFi provides i may not be desirable for every user who has access to the NiFi Ui to be able to access all processors or the specific data those processors are using.
So in addition to your valid example above, you may want to create stove pipe dataflows based off different input ports where only specific users are allowed view and modify to the stove pipe dataflow they are responsible for.
While you of course can have flowfiles from multiple upstream sources feed into a single input port and then use a routing type processor to split them back out to different dataflows, it can be easier just to have multiple input ports to achieve the same affect with less configuration.
Matt
Created 01-10-2017 07:56 PM
I pondered over this when started using NiFi and realized that this is good feature; In fact almost necessary to support a real-life data flow scenario.
Let us take a simple scenario - Lets say my group does risk evaluation in a bank and provides services to different LOB (consider credit & debit transaction groups only for this discussion) within the bank. Lets assume that the format of the transaction received from these two groups is exactly the same. However, the way data is received is different. While the Credit group places the data on a windows share, the Debit group requires the data to be read from their server via FTP. Now the data ingestion process of risk group, built on NiFi, will look something like this -
Now as you can see you would need to be able to support multiple input ports and output ports to support this flow.
Why cant we just place the entire flow? - Technically you can but wont that be messy, hard to manage, reduce the reusability drastically and make the overall flow less flexible / scalable.
Hope this helps!
Created 01-10-2017 08:00 PM
Process groups can be nested inside process groups and with the granular access controls NiFi provides i may not be desirable for every user who has access to the NiFi Ui to be able to access all processors or the specific data those processors are using.
So in addition to your valid example above, you may want to create stove pipe dataflows based off different input ports where only specific users are allowed view and modify to the stove pipe dataflow they are responsible for.
While you of course can have flowfiles from multiple upstream sources feed into a single input port and then use a routing type processor to split them back out to different dataflows, it can be easier just to have multiple input ports to achieve the same affect with less configuration.
Matt
Created 01-12-2017 05:18 PM
@Matt, couple of follow up questions on Processor group with multiple input ports;
1) within the processor group, how do you distinguish between flowfiles that are coming from the various input ports. 2) in data provenance screen, is there a way to tell which flowfiles are from which input ports