Support Questions

Find answers, ask questions, and share your expertise

nifi repositories directory configurations

avatar
Explorer

It is recommended to configure nifi repository(content, flowfile, provenance) directories to point to mounted folders in linux environment. Suppose I have a nifi cluster with multiple nodes, should all nodes point to the same mount for each node should point to different mount? Take for example with shared mount for all all nodes of 3 nodes cluster.

shared mount

/data/nifi/node1

/data/nifi/node2

/data/nifi/node3

separate mounts

/data1/nifi

/data2/nifi

/data3/nifi

Which is the better way?

Thanks,

Mark

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Mark Lin

Separate NiFi instances (even those that are part of same cluster) CANNOT share repositories. Each NiFi instance must have their own unique set of repositories since each instance will be working on its own unique set of FlowFiles.

-

Pointing NiFi repositories to mounted folders is an option, but for best performance local disks will perform better. For a high performance system, having multiple separate RAID disks (Raid 1 for data integrity) for Content, FlowFile, and Provenance repos is recommended.

-

Using mounted folders will affect performance, but offers and easier method of recovery if a node is lost forever. The Repositories are not tied in any way to specific NiFi instance/host. You can standup a new instance of NiFi and as long as you provide it with the FlowFile repo, Content repo, and cluster flow.xml.gz, it will be able to start up and continue processing from same point where the old dead node left off.

-

The specific naming of your mounts is what ever makes logical sense to you. As long as none of the cluster nodes are trying to write to the same mount, you will be good to go. Keep in mind that there can be considerable I/O with these repositories (depending on FlowFile volume and number of processors), so if all these mounted folders are from same mounted disk, you are likely to have performance issues as well. Separate disks is always the recommended path.

-

Thank you,

Matt

View solution in original post

2 REPLIES 2

avatar
Master Mentor

@Mark Lin

Separate NiFi instances (even those that are part of same cluster) CANNOT share repositories. Each NiFi instance must have their own unique set of repositories since each instance will be working on its own unique set of FlowFiles.

-

Pointing NiFi repositories to mounted folders is an option, but for best performance local disks will perform better. For a high performance system, having multiple separate RAID disks (Raid 1 for data integrity) for Content, FlowFile, and Provenance repos is recommended.

-

Using mounted folders will affect performance, but offers and easier method of recovery if a node is lost forever. The Repositories are not tied in any way to specific NiFi instance/host. You can standup a new instance of NiFi and as long as you provide it with the FlowFile repo, Content repo, and cluster flow.xml.gz, it will be able to start up and continue processing from same point where the old dead node left off.

-

The specific naming of your mounts is what ever makes logical sense to you. As long as none of the cluster nodes are trying to write to the same mount, you will be good to go. Keep in mind that there can be considerable I/O with these repositories (depending on FlowFile volume and number of processors), so if all these mounted folders are from same mounted disk, you are likely to have performance issues as well. Separate disks is always the recommended path.

-

Thank you,

Matt

avatar
Explorer

Hi Matt,

Thank you so much for the excellent, detailed explanation.

Mark