Support Questions

Find answers, ask questions, and share your expertise

NiFi - Questions concerning DistributedMapCache

avatar
Master Collaborator

Hi, currently I'm dealing with the DistributedMapCache-thing...

Found this article which gave me an idea of all this, great! https://community.hortonworks.com/articles/71837/working-with-a-nifi-distributedmapcache.html

Now I try to involve DistributedMapCache (DMC) in my NiFi-Flow without executing the script. Found old questions here but maybe in 1.7 or 1.8 there are new possibilities? Referring to above article, how can I in NiFi...

1. remove a concrete key (like command "remove")?

2. receive a list of existing keys (like command "keys"), maybe inclusive there values?

Further questions:

3. How can I clear all content of the DMC at once? Tested to disable the Controller Service for DMC-Server, but after enabeling data still existing.

4. In practice if one wants to use different tables in DMC would they all be in one DMC? Or would I configure DMC-Server and DMC-ClientService for each table?

5. At the moment I work at local installation. Is all this DMC-thing working without difficulty on a nifi-cluster (which will be coming soon I hope)?

6. Technical at the configuration of the DMC-ClientService the hostname and port of the DMC-Server has to be specified fix - which one to choose in a cluster?

If someone has further information concerning this subject I would be glad to get them. Thanks all!

1 ACCEPTED SOLUTION

avatar
Master Guru

1-3: The processors that use a DMC client use the DMC in a very specific manner, so they CRUD cache entries as it applies to their operations. There isn't currently a generic processor that lets you call arbitrary cache API methods, that's what the scripting components are for.

4: We don't have the concept of tables in DMC, only key/value pairs. A table can probably be implemented by namespacing the key, not sure if the processors you're using support custom keys though.

5: The DMC operation in a cluster is very similar to how it works on a local installation, except there is a DMC server created on each node in the cluster. However a DMC client still has to choose a single host:port to connect to, and the individual DMC servers are not coordinated at the cluster, meaning if you update one, the others don't get that update; they are fully separate at the moment.

6: AFAIK there is no best practice as far as choosing a DMC server to connect to, other than choosing one on a node that tends to be available most often. You basically get individual, isolated instances to choose from. We have other DMC server implementations that possibly support High Availability and/or Data Durability, such as HBase- or Redis- backed solutions. However neither of these are included with an Apache NiFi distribution, you'd have to bring your own.

View solution in original post

3 REPLIES 3

avatar
Master Collaborator

@Matt Burgess
May I ask You whether You have some answers for me (expecially concerning questions 1 and 2)? Thanks.

avatar
Master Guru

1-3: The processors that use a DMC client use the DMC in a very specific manner, so they CRUD cache entries as it applies to their operations. There isn't currently a generic processor that lets you call arbitrary cache API methods, that's what the scripting components are for.

4: We don't have the concept of tables in DMC, only key/value pairs. A table can probably be implemented by namespacing the key, not sure if the processors you're using support custom keys though.

5: The DMC operation in a cluster is very similar to how it works on a local installation, except there is a DMC server created on each node in the cluster. However a DMC client still has to choose a single host:port to connect to, and the individual DMC servers are not coordinated at the cluster, meaning if you update one, the others don't get that update; they are fully separate at the moment.

6: AFAIK there is no best practice as far as choosing a DMC server to connect to, other than choosing one on a node that tends to be available most often. You basically get individual, isolated instances to choose from. We have other DMC server implementations that possibly support High Availability and/or Data Durability, such as HBase- or Redis- backed solutions. However neither of these are included with an Apache NiFi distribution, you'd have to bring your own.

avatar
Master Collaborator

Hi @Matt Burgess thanks for Your quick and detailed answer!

1-3: I see, the script was not just to illustrate the DMC - it is NECESSARY to work with it in NiFi. OK I will use it.

4: So I have to prepend some information at "Cache Entry Identifier" on PutDMC to identify the entries coming from different "tables". OK this will work.

5-6: This points I have to clarify with the "techies"...