Created 10-30-2018 11:27 AM
Hello All,
I have requirement of caching some information in nifi and which should be used for every flow file validation (more than 1000 per second).
I am thinking to store information in Cassandra table and update that to Nifi cache every 24 hours or so..
(Nifi cache update using PutDistributedMapCache processor)
But problem is how to hold/re-load cache value when Nifi restarts due to various reasons..
I also want to understand where these cached message will be stored actually in multinode cluster nifi.
Thanks in advance.
Thanks,
Mahendra
Created 10-30-2018 01:55 PM
If you're using a DistributedMapCacheServer, you can set a "Persistence Directory" and it will store the cache to disk, so it will be available on restart.
DistributedMapCacheServers are started on each node in a cluster, but your DistributedMapCacheClientService provides the hostname and port to only one of them. They do not coordinate to keep the same data, so you will want all your clients (for put and get) to point to the same instance of the server.
Since you really just want the information from Cassandra, please feel free to file a New Feature Jira to add a CassandraMapCacheServer, then you could fetch the data directly from Cassandra.
Created 10-30-2018 01:55 PM
If you're using a DistributedMapCacheServer, you can set a "Persistence Directory" and it will store the cache to disk, so it will be available on restart.
DistributedMapCacheServers are started on each node in a cluster, but your DistributedMapCacheClientService provides the hostname and port to only one of them. They do not coordinate to keep the same data, so you will want all your clients (for put and get) to point to the same instance of the server.
Since you really just want the information from Cassandra, please feel free to file a New Feature Jira to add a CassandraMapCacheServer, then you could fetch the data directly from Cassandra.
Created 10-30-2018 03:42 PM
Thanks @Matt Burgess for clear information.
Do you suggest any other approach?
I am open for any kind of storage or caching, but end goal is I should be able to use some master data for every flow file validation without impacting performance.....
(Master data is combination of company-country-datatype : this i want to use for every flow file validation)
Thanks for your response 🙂
Mahendra