Support Questions

Find answers, ask questions, and share your expertise

How to update nifi cache on nifi service restart?

avatar
Master Collaborator

Hello All,

I have requirement of caching some information in nifi and which should be used for every flow file validation (more than 1000 per second).

I am thinking to store information in Cassandra table and update that to Nifi cache every 24 hours or so..

(Nifi cache update using PutDistributedMapCache processor)

But problem is how to hold/re-load cache value when Nifi restarts due to various reasons..

I also want to understand where these cached message will be stored actually in multinode cluster nifi.

Thanks in advance.

Thanks,

Mahendra

1 ACCEPTED SOLUTION

avatar
Master Guru

If you're using a DistributedMapCacheServer, you can set a "Persistence Directory" and it will store the cache to disk, so it will be available on restart.

DistributedMapCacheServers are started on each node in a cluster, but your DistributedMapCacheClientService provides the hostname and port to only one of them. They do not coordinate to keep the same data, so you will want all your clients (for put and get) to point to the same instance of the server.

Since you really just want the information from Cassandra, please feel free to file a New Feature Jira to add a CassandraMapCacheServer, then you could fetch the data directly from Cassandra.

View solution in original post

2 REPLIES 2

avatar
Master Guru

If you're using a DistributedMapCacheServer, you can set a "Persistence Directory" and it will store the cache to disk, so it will be available on restart.

DistributedMapCacheServers are started on each node in a cluster, but your DistributedMapCacheClientService provides the hostname and port to only one of them. They do not coordinate to keep the same data, so you will want all your clients (for put and get) to point to the same instance of the server.

Since you really just want the information from Cassandra, please feel free to file a New Feature Jira to add a CassandraMapCacheServer, then you could fetch the data directly from Cassandra.

avatar
Master Collaborator

Thanks @Matt Burgess for clear information.

Do you suggest any other approach?

I am open for any kind of storage or caching, but end goal is I should be able to use some master data for every flow file validation without impacting performance.....

(Master data is combination of company-country-datatype : this i want to use for every flow file validation)

Thanks for your response 🙂

Mahendra