Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

What's the purpose of multiple input and output ports in a NiFi process group

avatar
Expert Contributor

Greetings,

I am not sure if I understand why you would create multiple input and output ports for a process group (PG). What purpose would the additional ports serve ?

I am thinking that if you want to "call" (or send/receive data to/from) the same PG from many different processors in NiFi, then you would use different ports for each processor, to avoid mixing data the PG is getting from the different processors. As an example (please see the image below) if the PG has a MergeContent processor, then I would not want to merge flow files that are coming from different Processors into the PG. Is that one (if any) of the reason for having multiple ports in a PG ?

11304-using-pg-multiple-times-in-a-flow.png

1 ACCEPTED SOLUTION

avatar
Super Mentor

@Raj B

Process groups can be nested inside process groups and with the granular access controls NiFi provides i may not be desirable for every user who has access to the NiFi Ui to be able to access all processors or the specific data those processors are using.

So in addition to your valid example above, you may want to create stove pipe dataflows based off different input ports where only specific users are allowed view and modify to the stove pipe dataflow they are responsible for.

While you of course can have flowfiles from multiple upstream sources feed into a single input port and then use a routing type processor to split them back out to different dataflows, it can be easier just to have multiple input ports to achieve the same affect with less configuration.

Matt

View solution in original post

3 REPLIES 3

avatar

@Raj B

I pondered over this when started using NiFi and realized that this is good feature; In fact almost necessary to support a real-life data flow scenario.

Let us take a simple scenario - Lets say my group does risk evaluation in a bank and provides services to different LOB (consider credit & debit transaction groups only for this discussion) within the bank. Lets assume that the format of the transaction received from these two groups is exactly the same. However, the way data is received is different. While the Credit group places the data on a windows share, the Debit group requires the data to be read from their server via FTP. Now the data ingestion process of risk group, built on NiFi, will look something like this -

  1. A process group to read the data from shared drive
  2. A process group to read the data using FTP
  3. Another process group to take the input from the above two process groups and apply some further processing like - split records into individual transactions, ensure all mandatory data elements are present (if not, route to error) and then do two things
    1. A process group with flow to place the data on Kafka for the Storm topology to pick up and apply the model to evaluate risk
    2. Also, another process group with flow to store the data in HDFS for archival purpose to support audit, compliance requirement.

Now as you can see you would need to be able to support multiple input ports and output ports to support this flow.

Why cant we just place the entire flow? - Technically you can but wont that be messy, hard to manage, reduce the reusability drastically and make the overall flow less flexible / scalable.

Hope this helps!

avatar
Super Mentor

@Raj B

Process groups can be nested inside process groups and with the granular access controls NiFi provides i may not be desirable for every user who has access to the NiFi Ui to be able to access all processors or the specific data those processors are using.

So in addition to your valid example above, you may want to create stove pipe dataflows based off different input ports where only specific users are allowed view and modify to the stove pipe dataflow they are responsible for.

While you of course can have flowfiles from multiple upstream sources feed into a single input port and then use a routing type processor to split them back out to different dataflows, it can be easier just to have multiple input ports to achieve the same affect with less configuration.

Matt

avatar
Expert Contributor

@Matt, couple of follow up questions on Processor group with multiple input ports;

1) within the processor group, how do you distinguish between flowfiles that are coming from the various input ports. 2) in data provenance screen, is there a way to tell which flowfiles are from which input ports