Created 03-19-2018 06:03 PM
It is recommended to configure nifi repository(content, flowfile, provenance) directories to point to mounted folders in linux environment. Suppose I have a nifi cluster with multiple nodes, should all nodes point to the same mount for each node should point to different mount? Take for example with shared mount for all all nodes of 3 nodes cluster.
shared mount
/data/nifi/node1
/data/nifi/node2
/data/nifi/node3
separate mounts
/data1/nifi
/data2/nifi
/data3/nifi
Which is the better way?
Thanks,
Mark
Created 03-19-2018 08:29 PM
Separate NiFi instances (even those that are part of same cluster) CANNOT share repositories. Each NiFi instance must have their own unique set of repositories since each instance will be working on its own unique set of FlowFiles.
-
Pointing NiFi repositories to mounted folders is an option, but for best performance local disks will perform better. For a high performance system, having multiple separate RAID disks (Raid 1 for data integrity) for Content, FlowFile, and Provenance repos is recommended.
-
Using mounted folders will affect performance, but offers and easier method of recovery if a node is lost forever. The Repositories are not tied in any way to specific NiFi instance/host. You can standup a new instance of NiFi and as long as you provide it with the FlowFile repo, Content repo, and cluster flow.xml.gz, it will be able to start up and continue processing from same point where the old dead node left off.
-
The specific naming of your mounts is what ever makes logical sense to you. As long as none of the cluster nodes are trying to write to the same mount, you will be good to go. Keep in mind that there can be considerable I/O with these repositories (depending on FlowFile volume and number of processors), so if all these mounted folders are from same mounted disk, you are likely to have performance issues as well. Separate disks is always the recommended path.
-
Thank you,
Matt
Created 03-19-2018 08:29 PM
Separate NiFi instances (even those that are part of same cluster) CANNOT share repositories. Each NiFi instance must have their own unique set of repositories since each instance will be working on its own unique set of FlowFiles.
-
Pointing NiFi repositories to mounted folders is an option, but for best performance local disks will perform better. For a high performance system, having multiple separate RAID disks (Raid 1 for data integrity) for Content, FlowFile, and Provenance repos is recommended.
-
Using mounted folders will affect performance, but offers and easier method of recovery if a node is lost forever. The Repositories are not tied in any way to specific NiFi instance/host. You can standup a new instance of NiFi and as long as you provide it with the FlowFile repo, Content repo, and cluster flow.xml.gz, it will be able to start up and continue processing from same point where the old dead node left off.
-
The specific naming of your mounts is what ever makes logical sense to you. As long as none of the cluster nodes are trying to write to the same mount, you will be good to go. Keep in mind that there can be considerable I/O with these repositories (depending on FlowFile volume and number of processors), so if all these mounted folders are from same mounted disk, you are likely to have performance issues as well. Separate disks is always the recommended path.
-
Thank you,
Matt
Created 03-20-2018 12:53 PM
Hi Matt,
Thank you so much for the excellent, detailed explanation.
Mark