Support Questions

0tto · ‎04-23-2025

Hello everyone,

I'm seeking technical advice about designing data flows in Apache NiFi. I extract data from my source (Postgres), and my question is whether I should separate as process groups for Extract, Transform, and Load, or one process group covers end-to-end ETL for one table.

MattWho · ‎04-24-2025

@0tto

Using child process groups is a matter of your own personal preference.
Using child Process Groups allows you create a more manageable NiFi canvas by placing unique dataflows in different Process Groups. When it comes to one continuous dataflow, you may choose to put portions of it in to child process groups.

For example, you might do this if portions of the dataflow can be reusable. You can right click on a process group and download a flow definition or you can choose to version control a process group to NiFi-Registry. These becomes snippets of your overall end-to-end dataflow. So let's your "transform" sub dataflow is reusable with just a few modifications, others could easily reuse it by importing from NIFi-Registry or deploying a shared flow definition.

Typically users of NiFi will create a process group per unique end-to-end datflow or will create a unique process group per team to separate dataflows and control access per process group so team 1 can't mess with team 2's process group.

Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.

Thank you,
Matt

View solution in original post

Shrink · ‎04-24-2025

@hott
In my case having different process groups help me to set different triggers, manage queue more efficiently (by storing flow file temporally to save RAM or backpressure ). This is required in my case to compete set of process fully before it start next set of process.

View solution in original post

MattWho · ‎04-25-2025

@Shrink What do you mean by "by storing flow file temporally to save RAM or backpressure"?

FlowFiles held in NiFi connections will consume NiFi heap memory (unless queue has gotten very large resulting some of those queued FlowFiles being swapped to disk). But this behavior is no different if you use process groups or not.

Process groups do allow you to configure:

"Process Group FlowFile Concurrency"
"Process Group Outbound Policy"

I assume you are using the above to control the FlowFile going in and out of your process groups as those FlowFiles move from one process group to the next? Using these allows you to insure processing in one PG completes before the outbound FlowFiles are released to the next downstream process group. This also allows you to leave all your processor in a running state for more efficient /performant dataflow.

Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.

Thank you,
Matt

View solution in original post

MattWho · ‎04-24-2025

@0tto

Using child process groups is a matter of your own personal preference.
Using child Process Groups allows you create a more manageable NiFi canvas by placing unique dataflows in different Process Groups. When it comes to one continuous dataflow, you may choose to put portions of it in to child process groups.

For example, you might do this if portions of the dataflow can be reusable. You can right click on a process group and download a flow definition or you can choose to version control a process group to NiFi-Registry. These becomes snippets of your overall end-to-end dataflow. So let's your "transform" sub dataflow is reusable with just a few modifications, others could easily reuse it by importing from NIFi-Registry or deploying a shared flow definition.

Typically users of NiFi will create a process group per unique end-to-end datflow or will create a unique process group per team to separate dataflows and control access per process group so team 1 can't mess with team 2's process group.

Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.

Thank you,
Matt

Shrink · ‎04-24-2025

@hott
In my case having different process groups help me to set different triggers, manage queue more efficiently (by storing flow file temporally to save RAM or backpressure ). This is required in my case to compete set of process fully before it start next set of process.

MattWho · ‎04-25-2025

@Shrink What do you mean by "by storing flow file temporally to save RAM or backpressure"?

FlowFiles held in NiFi connections will consume NiFi heap memory (unless queue has gotten very large resulting some of those queued FlowFiles being swapped to disk). But this behavior is no different if you use process groups or not.

Process groups do allow you to configure:

"Process Group FlowFile Concurrency"
"Process Group Outbound Policy"

I assume you are using the above to control the FlowFile going in and out of your process groups as those FlowFiles move from one process group to the next? Using these allows you to insure processing in one PG completes before the outbound FlowFiles are released to the next downstream process group. This also allows you to leave all your processor in a running state for more efficient /performant dataflow.

Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.

Thank you,
Matt

0tto · ‎04-28-2025

Thank you for this insightful and practical advice.

Cloudera Community

Support Questions

Best Practice to Design Data Flows