Support Questions
Find answers, ask questions, and share your expertise

NiFi - Content Repository configuration

NiFi - Content Repository configuration

Which will be better content repository configuration ?

1 8 TB disk array configured as RAID 1 with 1 mount point.

or 4 2TB disks configured as RAID 1 with 4 mount points.

Does mount point make a difference in terms of how content repository is utilized by NiFi ?

11 REPLIES 11

Re: NiFi - Content Repository configuration

Hi Shishir,

The second option will offer best performances. I'd suggest you having a look at the following documentation (still a draft at the moment) regarding best practices : https://github.com/JPercivall/nifi/blob/NIFI-1028/nifi-docs/src/main/asciidoc/nifi-in-depth.adoc

Re: NiFi - Content Repository configuration

Thanks @Pierre Villard. So NiFi can utilize multiple disks better if it is configured as multiple mount points ?

Re: NiFi - Content Repository configuration

Multiple Physical Storage Points

For the Provenance and Content repos, there is the option to stripe the information across multiple physical partitions. An admin would do this if they wanted to federate reads and writes across multiple disks. The repo (Content or Provenance) is still one logical store but writes will be striped across multiple volumes/partitions automatically by the system. The directories are specified in the nifi.properties file.

Re: NiFi - Content Repository configuration

@Pierre Villard

I think I was not clear on my original question. I am not sure if both configuration are same.

Option 1: 8 1 TB disks configured as RAID 1 array with 1 mount points

Option 2: 8 1 TB disks configured as RAID 1 with 4 mount points

With option 1, better utilization of disk space and it will ensure that all my disks are equally utilized.

Option 2 may cause one mount point to be full while other may still have space remaining .

Re: NiFi - Content Repository configuration

@jpercivall Joe will know for sure.

My understanding is that using multiple disks will offer better performances regarding I/O operations. The system will handle the repartition of the content over the disks and will see the set of disks as one single volume.

Re: NiFi - Content Repository configuration

You are correct Pierre, option 2 is better due to the I/O benefits it offers. As note, in order to configure the content repo to use multiple partitions you must explicitly mention them in the nifi.properties file. You can find information explaining it in the "Content Repository" subsection of the "System Properties" section in the Admin Guide[1]. Look for the "nifi.content.repository.directory" property.

[1] https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#system_properties

Re: NiFi - Content Repository configuration

Guru

The key to good disk performance is to separate the repositories across multiple spindles, so that the content_repository, flowfile_repository and provenance_repository are on physically separate disks. This is because flows will cause changes to all these. Your best bet in your configuration is to reserve 2 RAID 1 mounts for the content_repository, nifi will then evenly balance content over these drives, and then to have a RAID 1 for the flowfile_repository and one for the provenance_repository. This may not give you the most efficient use of space, but space is cheap, and it will give you much better performance than trying to push everything onto a single mount.

Re: NiFi - Content Repository configuration

@Simon Elliston Ball Thanks Simon. I already have separate mounts for flowfile, provenance, OS and logs. I am just trying to understand if it is better to have one mount point with all disks for content repository or should I have multiple mount points for physical disks.

Re: NiFi - Content Repository configuration

Super Collaborator

According to a simple test I ran, it does a round robin, writing one content claim to each directory. So, as long as your data comes in randomly, it should be pretty well distributed. If every fourth data input is larger than others, you may see one volume used more than others.

My test was simple. Four directories (actually on the same volume) for the content repos. Then I used GenerateFlowFile, 30MB each. It was clear that it was doing round robin as each directory grew.

There is a chance that it is smarter and also considers disk space on each volume, but I was unable to test that.