Member since
01-04-2023
3
Posts
1
Kudos Received
0
Solutions
01-12-2023
01:30 PM
@SachinMehndirat There is NO replication of data from the four NiFi repositories across all NiFi nodes in a NiFi cluster. Each NiFi node in the cluster is only aware of and only excutes against the FlowFile on that specific node. As such, NiFi nodes can not share a common set of repositories. Each node must have their own repositories and it is important to protect those repositories from data loss (flowfile_repository and content_repository being most important). - flowfile_repository - contain metadata/attributes about FlowFiles actively processing thorugh your NiFi dataflow(s). This includes metadata on location of content of queued FlowFiles. - content_repository - contains content claims that can hold the content for 1 too many FlowFiles actively being processed or temporarily archived post termination at end of dataflow(s) - provenance_repository - contains historical lineage information about FlowFile currently or previously processed through your NiFi dataflows. - database_repository - contains flow configuration history which is a record of changes made via NiFi UI (adding, modifying, deleting, stopping, starting, etc...). Also contain info about users currently authenticated in to the NiFi node. Processors that record cluster wide state would use zookeeper to store and retrieve that stored state needed by all nodes. Processors that use local state will write that state to NiFi locally configured state directory. So in addition to protect the repositories mentioned above from dataloss, you'll also want to make sure local state (unique to each node in the NiFi cluster) directory is also protected. The embedded documentation in NiFi for each component has a section "State management:" that will tell you if that component use local and/or cluster state. You may find some of the info found in the following articles useful: https://community.cloudera.com/t5/Community-Articles/HDF-CFM-NIFI-Best-practices-for-setting-up-a-high/ta-p/244999 https://community.cloudera.com/t5/Community-Articles/Understanding-how-NiFi-s-Content-Repository-Archiving-works/ta-p/249418 https://blogs.apache.org/nifi/entry/load-balancing-across-the-cluster If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more