- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to update nifi cache on nifi service restart?
- Labels:
-
Apache NiFi
Created ‎10-30-2018 11:27 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello All,
I have requirement of caching some information in nifi and which should be used for every flow file validation (more than 1000 per second).
I am thinking to store information in Cassandra table and update that to Nifi cache every 24 hours or so..
(Nifi cache update using PutDistributedMapCache processor)
But problem is how to hold/re-load cache value when Nifi restarts due to various reasons..
I also want to understand where these cached message will be stored actually in multinode cluster nifi.
Thanks in advance.
Thanks,
Mahendra
Created ‎10-30-2018 01:55 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you're using a DistributedMapCacheServer, you can set a "Persistence Directory" and it will store the cache to disk, so it will be available on restart.
DistributedMapCacheServers are started on each node in a cluster, but your DistributedMapCacheClientService provides the hostname and port to only one of them. They do not coordinate to keep the same data, so you will want all your clients (for put and get) to point to the same instance of the server.
Since you really just want the information from Cassandra, please feel free to file a New Feature Jira to add a CassandraMapCacheServer, then you could fetch the data directly from Cassandra.
Created ‎10-30-2018 01:55 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you're using a DistributedMapCacheServer, you can set a "Persistence Directory" and it will store the cache to disk, so it will be available on restart.
DistributedMapCacheServers are started on each node in a cluster, but your DistributedMapCacheClientService provides the hostname and port to only one of them. They do not coordinate to keep the same data, so you will want all your clients (for put and get) to point to the same instance of the server.
Since you really just want the information from Cassandra, please feel free to file a New Feature Jira to add a CassandraMapCacheServer, then you could fetch the data directly from Cassandra.
Created ‎10-30-2018 03:42 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks @Matt Burgess for clear information.
Do you suggest any other approach?
I am open for any kind of storage or caching, but end goal is I should be able to use some master data for every flow file validation without impacting performance.....
(Master data is combination of company-country-datatype : this i want to use for every flow file validation)
Thanks for your response 🙂
Mahendra
