About MattWho

MattWho · ‎06-11-2018

@John T - Many factors go in to determining the correct hardware configuration. Most of which comes from the size or you data, amount of data, and specific processors used in your dataflow and how they are configured. - What I can tell you is running NiFi with a 300 GB configured heap is probably not the best idea. At a heap of that size, even partial garbage collection events could result in considerable stop-the-world times. - I would tend to lean more towards the smaller VMs to make better use of your hardware resources, but again that is based on very little knowledge of your specifics. Best to standup a small VM like you described and perform some load testing on your specific dataflows to determine how much data each VM could potentially process and scale from there on how many nodes you will actually need. - Also keep in mind that the memory used by NiFi's dataflows can and often does extend beyond just heap. Make sure you do not allocate to much memory to each VMs heap which may result in server issues due to insufficient memory for the OS and not heap related processing. (For example: think along the lines of OS level socket queues, externally issued commands and any scripting based processors you may use) - NiFi clusters in the range of 40 nodes is fairly uncommon; however, NiFi doe snot put a restriction on the number of nodes you can have in a cluster. Just make sure that as your increase the number of nodes you make adjustments to the Cluster node properties to maintain cluster stability. Most specifically nifi.cluster.node.connection.timeout (60 secs or higher), nifi.cluster.node.read.timeout (60 seconds or higher), and nifi.cluster.protocol.heartbeat.interval (20 seconds or higher). - As far as GC goes, the default G1GC has proven to be good performer. I have heard of some rare corner cases that can cause some stability issues and users have resolved that by commenting out the G1GC line in nifi bootstrap.conf file and just going with the default GC in the latest versions of Java. - Hope this information was useful for you, Matt

MattWho · ‎06-11-2018

@Frank Gaibler - The RPG works in the following way: 1. User configures RPG with target NiFi URL (Optional: if target is a NiFi cluster a comma separated list of each node's URL can be provided). User decides if HTTP or RAW transport protocols will be used. 2. Upon clicking "ADD", the RPG will try connecting to the target NiFi in order to retrieve Site-to-Site (S2S) details. These details include things like the number of nodes, node connection info, available remote input/output ports, and current load on each node (if target is cluster) ---- Up to this point it sounds like you are working ---- The node connection information is what matter to you here and it comes from the following properties. The properties you need to be most concerned with are: 1. nifi.remote.input.host= <-- here you can explicitly define the hostname you want sent back to any client requesting a S2S connection. It is this hostname that will be used by the configured transport protocol. leaving this blank will result in Java trying to determine the hostname and in many case this can lead to an internal hostname or even localhost being used. 2. nifi.remote.input.socket.port= <-- If RAW transport protocol is selected, this is the port all FlowFile transmission will occur over. Make sure it is open through any firewalls. - Thanks, Matt - When an "Answer" addresses/solves your question, please select "Accept" beneath that answer. This encourages user participation in this forum.

MattWho · ‎06-08-2018

https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html - NiFi even provides a toolkit you can use to create your own certificates/keystores for each of your NiFi nodes. - Matt

MattWho · ‎06-08-2018

@Pierric Ruchier - My initial thought would be to use the PutDistributedMapCache and FetchDistributedMapCache processors. - You would nee to setup a flow that uses a GenerateFlowFile and PutDistributedMapCache processors to created the initial cache entry with a value of "0". - Then in your main flow, Generate a FlowFile (how ever you want to routinely trigger main flow) --> FetchDistributedMapCache (retrieve current cached value) --> make request based on cached value retrieved --> PutDistributedMapCache (update stored cached value with new timestamp) --> rest of flow... - This gives you a way to constantly update a value in cache that can be used. - Thanks, Matt - When an "Answer" addresses/solves your question, please select "Accept" beneath that answer. This encourages user participation in this forum.

MattWho · ‎06-07-2018

@Bhushan Kandalkar I was afraid of that. Ranger does not allow wildcards in the user names. From a security standpoint it is generally a bad idea to create a server certificate that uses wildcards. In order to use Ranger as your authorizer, you are going to need to create new NiFi node certificates/keystores that do not use wildcards in the "Owner" DN. - This means you will have a unique keystore for each of your NiFi nodes (which is a security best practice). You will then need to authorize each of those nodes with /proxy. - Thanks, Matt

MattWho · ‎06-07-2018

@dhieru singh The Jira lists the fix as being addressed in Apache NiFi 1.7.0 which has not been released yet. - Thanks, Matt

MattWho · ‎06-07-2018

@Bhushan Kandalkar That is correct.

MattWho · ‎06-07-2018

@dhieru singh - The most likely reason a "Stop" action was not found is because the processor was stopped as a result of a stop action taken against the Process Group in which this processor exists. This is a known bug outlined here: - https://issues.apache.org/jira/browse/NIFI-4997 - Thank you, Matt - When an "Answer" addresses/solves your question, please select "Accept" beneath that answer. This encourages user participation in this forum.

MattWho · ‎06-07-2018

@Bhushan Kandalka - Once the Ranger plugin is enabled, the authorizations.xml file is no longer used to determine what authorizations both users and Nifi nodes have. In a NiFi cluster each node must be authorized to act as a proxy so that requests made by users logged in to any one of the nodes's UIs can be replicated to the other nodes. This means that you will need to set an authorization policy in Ranger that authorizes "CN=*.test.com, OU=NIFI" against the "/proxy" policy. - Thank you, Matt

MattWho · ‎06-07-2018

@Henrik Olsen The FetchSFTP will make a separate connection for each file being retrieved. Concurrent Tasks will allow you to specify the number of concurrent connections allowing more then one file to retrieved per processor execution schedule (still one file per connection). - Yes, HDF 3.1 will have all these goodies. Suggest skipping directly to HDF 3.1.2 which was just released since it has a loyt of fixes for some annoying bugs in HDF 3.1 and 3.1.1. - You will have the option to use either an external REDIS configured how you like or an internal NiFi DistributedMapCacheServer with the WAIT and NOTIFY processors. - The DistributedMapCacheServer provides the following configurations: There is no TTL for the DistributedMapCacheServer option. - There also isn't a processor that will dump out the current contents of the DistirbutedMapCacheServer, but you should be able to write a script that can do that for you. Here is an example script that is used to remove a cached entry: https://gist.github.com/ijokarumawak/14d560fec5a052b3a157b38a11955772 - I do not know a lot about REDIS, but as an externally managed cache service, it probably will give you a lot more options as well as a cluster capability so you don't have a single point of failure like you would have with the DistributedMapCacheServer. - Thank you, Matt

Online	Online
Last Visited	‎02-02-2026 08:13 PM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎02-02-2026 08:13 PM
Posts	3,432
Kudos received	1628

Cloudera Community

Re: Setting TTL per key when writing to redis

Re: Best Practice for configuring registry flows

Re: Nifi 2.7.2 Start Problem

Re: Error importing NiFi workflow template from ve...

Re: nifi 2.6 registry security scan results

Re: Best NiFi Heap usage performance for Large Ser...

Re: Sending data from local pc to remote process g...

Re: Nifi Integration with Ranger Not Working

Re: Store timestamp to use it in next GetHTTP call

Re: Nifi Integration with Ranger Not Working

Re: NiFi Flow configuration history failed to capt...

Re: Nifi Integration with Ranger Not Working

Re: NiFi Flow configuration history failed to capt...

Re: Nifi Integration with Ranger Not Working

Re: FetchSFTP and reuse of connection