Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Can we use apache NIFI as storage ? means persist data on to NiFi local storage and retrieve whenever we want ?

avatar
Explorer

I would like to use NIFI to retreive files from external SFTP and store on local disk with RAID 10 and retrieve back whenever i need it. is it possible ? That means pretty much i want replace SAN or Isilon type of storage and use NIFI as a processing engine and storage engine.

3 REPLIES 3

avatar

@Sri Bet

NiFi was designed to move data from one place to another, not to store it. NiFi stores data in content repository temporarily for the processing but has routines to delete flow files automatically. Data is deleted just after the end of the flow or after an archive retention period which 12 hours by default. This article explains the archiving process : https://community.hortonworks.com/articles/82308/understanding-how-nifis-content-repository-archivi....

As you can see, NiFi is designed to delete data that's not anymore used. The idea behind is that NiFi moved it to a storage location.

You should use storage solution for storing data not NiFi. For instance, why don't you use your FTP server for this?

avatar
Contributor

Maybe do you are looking for SSoT (Single Source of Thruth). Kafka may be the best option to achieve this concept. The link bellow may help you:
https://www.confluent.io/blog/messaging-single-source-truth/

avatar
Super Guru

It is absolutely possible to do this.  However somethings need to be considered:

  1. With a multi node Nifi cluster, the local storage must be a single location usually the primary node.  This data will not be local to the rest of the cluster nodes.
  2. The location should be separate from OS partition and the required nifi repository partitions.  This is to avoid corrupting these partitions in the chance local storage consumes all available space.

 

In past projects I have used primary node, with a separate partition to storing files local to NiFi Primary Node.  These files are then used outside of NiFi for other purposes.  In some projects these files are picked up in NiFi in separate flows, and then re-distributed into the cluster for processing across all nodes.    The primary use case here was audit received files directly to disk by Team 1.  Some time later Team 2 access files for processing.    In this sample Team 1 and Team 2 are completely separate with Security Group based access to nifi (they cannot see each others flows).