Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Broadcast a Flowfile from primary node to all nodes nifi

avatar

Hi All,

I have a scenario where i have a 3 node HDF cluster and i am making an API call to get the validation token(valid for 1 hour only) using InvokeHTTP every one hour on primary node only, since its not preferred to make multiple API calls.

i am putting this token in Distributed Cache, but since i want to fetch the token from the distributedCache on all nodes, is there a way i can duplicate the flowfile from primary node to all nodes to the processors Downstreams.

is there a way i can achieve this ?

1 ACCEPTED SOLUTION

avatar
Super Mentor
@pavan srikar

-

The design you have in place looks to be correct solution based on your described use case here.

Every node in your cluster runs the exact same flow.xml.gz

-

You would typically configure your "PutDistributedMapCache" and "FetchDistributedMapCache" processors to use a "Distributed Cache Service" that every node has access to.

-

This allows you run a single "primary node" only flow that retrieves the token based on a one hour cron and writes it to the distributed Map cache and then have a second flow that every node runs that pulls that stored token value from the distributed map cache and uses it for your downstream calls.

-

Using the "RedisDistributedMapCacheClientService" controller service for example allows you to set a TTL on the values you store in the cache. This allows you to expire the stored token before it is no longer valid. For example token is good for 1 hour, so you could set TTL to 50 - 55 minutes.

-

Thank you,

Matt

-

If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.

View solution in original post

6 REPLIES 6

avatar
Super Mentor
@pavan srikar

-

The design you have in place looks to be correct solution based on your described use case here.

Every node in your cluster runs the exact same flow.xml.gz

-

You would typically configure your "PutDistributedMapCache" and "FetchDistributedMapCache" processors to use a "Distributed Cache Service" that every node has access to.

-

This allows you run a single "primary node" only flow that retrieves the token based on a one hour cron and writes it to the distributed Map cache and then have a second flow that every node runs that pulls that stored token value from the distributed map cache and uses it for your downstream calls.

-

Using the "RedisDistributedMapCacheClientService" controller service for example allows you to set a TTL on the values you store in the cache. This allows you to expire the stored token before it is no longer valid. For example token is good for 1 hour, so you could set TTL to 50 - 55 minutes.

-

Thank you,

Matt

-

If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.

avatar
Super Mentor

@pavan srikar

I should add that there is no processor that will specifically clone a FlowFile to every node in the NiFi cluster.
-

But there are other options if you do not want to standup an external map cache server.

-

Perhaps setting up a disk mount that is shared across all nodes.

On Primary node only you run a flow that retrieves a new token every ~55 minutes writes it to this shared mounted directory set to overwrite previous written token each time.

Then on all nodes you could create a flow that consumes this token without deleting it on schedule to perform your all node tasks.

-

Just a second option for you.

-

Thank you,

Matt

avatar

@Matt Clarke when i am writing the token using "PutDistributedMapCache", the token is only accessible to the primary node. for all the other nodes fail saying value not found when trying to read using FetchDistributedMapCache, probably because the the CacheServer is on localhost. does this make sense ?

so on a whole i should host the server on something other than localhost so that all nodes can access the server, is this right ?

avatar
Super Mentor

yes, that makes sense. When you start the DistirbutedMapCache server is starts a server on each NiFi node.
The DIstributedMapCache Client should be configured to point at one specific node, so that every node pulls cache entries from same server.

-

A little back history:

The DistibrutedMapCacheServer and DistirbutedMapCacheClient controller services date back to original NiFi releases versions. Back in those days there was no zero master clustering which we have now. There was a dedicated server that ran a NiFi Cluster Manager (NCM). At that time the DistributedMapCacheServer could only be setup on the NCM.

-

Once NiFi moved away from having a NCM, the functionality of these controller services was not changed to avoid breaking flows of user who moved to latest versions. The DistirbutedMapCacheServer does not offer HA (if node hosting server goes down, cache becomes unavailable). To provide HA here, new external HA caches options have been added as options.

-

thanks,

Matt

avatar
Super Mentor

yes, that makes sense. When you start the DistirbutedMapCache server is starts a server on each NiFi node.
The DIstributedMapCache Client should be configured to point at one specific node, so that every node pulls cache entries from same server.

-

A little back history:

The DistibrutedMapCacheServer and DistirbutedMapCacheClient controller services date back to original NiFi releases versions. Back in those days there was no zero master clustering which we have now. There was a dedicated server that ran a NiFi Cluster Manager (NCM). At that time the DistributedMapCacheServer could only be setup on the NCM.

-

Once NiFi moved away from having a NCM, the functionality of these controller services was not changed to avoid breaking flows of user who moved to latest versions. The DistirbutedMapCacheServer does not offer HA (if node hosting server goes down, cache becomes unavailable). To provide HA here, new external HA caches options have been added as options.

-

thanks,

Matt

avatar
Super Mentor

@pavan srikar

yes, that makes sense. When you start the DistirbutedMapCache server is starts a server on each NiFi node.The DIstributedMapCache Client should be configured to point at one specific node, so that every node pulls cache entries from same server.

-

A little back history:

The DistibrutedMapCacheServer and DistirbutedMapCacheClient controller services date back to original NiFi releases versions. Back in those days there was no zero master clustering which we have now. There was a dedicated server that ran a NiFi Cluster Manager (NCM). At that time the DistributedMapCacheServer could only be setup on the NCM.

-

Once NiFi moved away from having a NCM, the functionality of these controller services was not changed to avoid breaking flows of user who moved to latest versions. The DistirbutedMapCacheServer does not offer HA (if node hosting server goes down, cache becomes unavailable). To provide HA here, new external HA caches options have been added as options.

-

thanks,

Matt