Created 03-09-2024 08:11 PM
Hello,
My team is about to deploy a clustered scaling NiFi, along with NiFi Registry and Zookeeper. We are deploying on Openshift Kubernetes.
We are determining which directories to mount our persistent storage to on both NiFi and the Registry.
1. Since we are using Registry, do we need to back anything up on the NiFi pod?
2. What are all of the locations we would want mount persistent storage for both NiFi and Registry?
3. What are all of the locations we would want to establish a backup procedure for on NiFi and Registry if we are using local filesystem persistence?
4. Do we still need to backup any locations if we use the Git and S3 persistence providers?
5. What does the restoration process look like for both
A. local persistence providers only
B. Git + S3 persistence
6. Do we need to worry about Zookeeper at all for backups?
Thanks ahead of time!
Created on 03-11-2024 07:41 AM - edited 03-11-2024 07:42 AM
@TreantProtector
There is a lot of ask in this one post.
1. NiFi Registry is used to store NiFi version controlled NiFi process groups (This takes user manual action to both initiate version control and push new versions to NiFi-Registry. It does not store the flow.xml.gz or flow.json.gz files that contains all the flow information NiFi loads on startup. So it is not a substitute for protecting those files on NiFi. All nodes in a NIFi cluster use the same flow.xml.gz/flow.json.gz, so it is not necessary to preserve the files from every node for recovery.
2a (NiFi)
2b. NiFi-Registry
3. covered in above - refer to Apache NiFi nifi.properties file for your configured local storage paths.
4. yes - covered above
5a. Not sure I follow the question. On restoration NiFi or NiFi will read the persistence provider (whether they are local, git, or S3) preserving the NiFi and NiFi-Registry conf directory configuration files would make restoration easier. While the NiFi content_repository(s) and flowfile_repository are tightly coupled to one another on the same node and tie back to the flow.xml.gz/flow.json.gz (same all nodes) content. which node they get restored to does not matter (specific node information is not present in any of those).
NOTE: content_repositories are directly correlated to the content_repository property name in the nifi.properties file.
nifi.content.repository.directory.default=/dir1/node1
nifi.content.repository.directory.repo2=/dir2/node1
Upon restoration content_repository contents persisted for /dir1/node1 must still be set in "defualt" and not set to different property name. This is because the flowfile metadata in the corresponding flowfile_repository does not contain directory details. It simply says you can find content for FlowFile xyz in nifi.content.repository.directory.default at sub-directory (num), content claim, byte offset, and num bytes. So if you put dir2 in the default content_repository you'll mess up finding your content.
6. Zookeeper is used to store cluster state used by a good number of NiFi processors (refer to individual processor documentation for state information. For every processor documentation. there is a "state management" section that tells you if the specific processor component stores state and if that state is local or cluster). State is stored for a specifc component For cluster state stored in zookeeper it is not node specific state as all components that use cluster state utilize same state information. Failing to protect against loss of state info typically leads to data duplication, but all depends on how a given processor is using that state information.
Example: ListSFTP 1.25.0.
If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped.
Thank you,
Matt
Created 03-28-2024 07:43 AM
@TreantProtector
Everything the user adds to the canvas including controller service and reporting tasks are auto-saved in the flow.json.gz. Each time a change is made the current flow.json.gz is archived and new flow.json.gz is generated. Within the flow.json.g are all components (processors, connections, controller services, reporting tasks, funnels, process groups, ports, parameters, etc.) and their configurations. Any configuration property that is "sensitive" (passwords) are going to be encrypted in the flow.json.gz file. So in order to load that flow.json.gz in another NiFi, you would need to know the nifi.sensitive.props.algorithm and nifi.sensitive.props.key used by the original NiFi which it came from.
Encrypted Passwords in Flows
If you don't have that info, the flow.json.gz can still be loaded on another NiFi after manually editing the file to remove all the "enc{...}" values. Once flow.json.gz loads, an authorized user would need to re-enter all passwords in all components where it is needed via the NiFi UI.
Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.
Thank you,
Matt
Created 03-11-2024 05:10 AM
@TreantProtector Welcome to the Cloudera Community!
To help you get the best possible solution, I have tagged our NiFi experts @mburgess @MattWho who may be able to assist you further.
Please keep us updated on your post, and we hope you find a satisfactory solution to your query.
Regards,
Diana Torres,Created on 03-11-2024 07:41 AM - edited 03-11-2024 07:42 AM
@TreantProtector
There is a lot of ask in this one post.
1. NiFi Registry is used to store NiFi version controlled NiFi process groups (This takes user manual action to both initiate version control and push new versions to NiFi-Registry. It does not store the flow.xml.gz or flow.json.gz files that contains all the flow information NiFi loads on startup. So it is not a substitute for protecting those files on NiFi. All nodes in a NIFi cluster use the same flow.xml.gz/flow.json.gz, so it is not necessary to preserve the files from every node for recovery.
2a (NiFi)
2b. NiFi-Registry
3. covered in above - refer to Apache NiFi nifi.properties file for your configured local storage paths.
4. yes - covered above
5a. Not sure I follow the question. On restoration NiFi or NiFi will read the persistence provider (whether they are local, git, or S3) preserving the NiFi and NiFi-Registry conf directory configuration files would make restoration easier. While the NiFi content_repository(s) and flowfile_repository are tightly coupled to one another on the same node and tie back to the flow.xml.gz/flow.json.gz (same all nodes) content. which node they get restored to does not matter (specific node information is not present in any of those).
NOTE: content_repositories are directly correlated to the content_repository property name in the nifi.properties file.
nifi.content.repository.directory.default=/dir1/node1
nifi.content.repository.directory.repo2=/dir2/node1
Upon restoration content_repository contents persisted for /dir1/node1 must still be set in "defualt" and not set to different property name. This is because the flowfile metadata in the corresponding flowfile_repository does not contain directory details. It simply says you can find content for FlowFile xyz in nifi.content.repository.directory.default at sub-directory (num), content claim, byte offset, and num bytes. So if you put dir2 in the default content_repository you'll mess up finding your content.
6. Zookeeper is used to store cluster state used by a good number of NiFi processors (refer to individual processor documentation for state information. For every processor documentation. there is a "state management" section that tells you if the specific processor component stores state and if that state is local or cluster). State is stored for a specifc component For cluster state stored in zookeeper it is not node specific state as all components that use cluster state utilize same state information. Failing to protect against loss of state info typically leads to data duplication, but all depends on how a given processor is using that state information.
Example: ListSFTP 1.25.0.
If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped.
Thank you,
Matt
Created 03-11-2024 07:34 PM
Thank you so much @MattWho for your detailed response. If we are mostly only concerned with backing up and restoring the process groups/Registry data, what would be the bare minimum we would need to backup on the NiFi (not Registry) pod to restore operations with fresh containers?
I think you mentioned we would definitely want to backup flow.json.gz for this scenario, but I wanted to make sure.
Created 03-28-2024 07:43 AM
@TreantProtector
Everything the user adds to the canvas including controller service and reporting tasks are auto-saved in the flow.json.gz. Each time a change is made the current flow.json.gz is archived and new flow.json.gz is generated. Within the flow.json.g are all components (processors, connections, controller services, reporting tasks, funnels, process groups, ports, parameters, etc.) and their configurations. Any configuration property that is "sensitive" (passwords) are going to be encrypted in the flow.json.gz file. So in order to load that flow.json.gz in another NiFi, you would need to know the nifi.sensitive.props.algorithm and nifi.sensitive.props.key used by the original NiFi which it came from.
Encrypted Passwords in Flows
If you don't have that info, the flow.json.gz can still be loaded on another NiFi after manually editing the file to remove all the "enc{...}" values. Once flow.json.gz loads, an authorized user would need to re-enter all passwords in all components where it is needed via the NiFi UI.
Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.
Thank you,
Matt
Created 03-19-2024 11:13 AM
@TreantProtector Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.
Regards,
Diana Torres,