About MattWho

MattWho · ‎10-24-2018

@Willian Gosse - During a NiFi restart, the flow is loaded and started before the NiFi UI is made available. During this period of time the Remote Process Groups (RPG) on each node will fail to be able to connect to the configured target NiFi URL to fetch the Site-To-Site (S2S) details. This is expected behavior. The RPGs will stop throwing this error in the logs once the configured target NiFi URL is made available and the S2S details are successfully retrieved. - The use of HTTP or RAW as the transport protocol controls how the actual FlowFiles are transferred. The re-occurring connection to retrieve the S2S details will always be over http to the target NiFi URL configured in the RPG. When using HTTP transport protocol. the NiFi FlowFiles will also be transferred via the same HTTP port as the Target NiFi UI is exposed on. Setting transport protocol to use RAW causes the RPG to use a dedicated socket port for the FlowFile transfer. The socket port used is set by the target NiFi servers in the nifi.properties file (property: nifi.remote.input.socket.port=). The advantage to using RAW is that amount of traffic going to HTTP port used to access UI is reduced considerably. The advantage to using HTTP is that you have one less port you must open through any firewalls to the NiFi nodes. - Thank you, Matt - If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.

MattWho · ‎10-19-2018

@Andy Gisbo Yes, that guide is accurate example of using OpenID with google.

MattWho · ‎10-19-2018

@Emma Ixiato - The list based processors rely on File timestamps to determine if a file should be listed or not. This means that the list based processors may not list files in the target location if: - 1. New files added to source location do not have their timestamp updated. (Thus last recorded timestamp in NiFi from previous listing is newer the age of file that was added) 2. Multiple files are being written to source location at same time and the list based processor did not list all of them in my execution. Second execution would miss other files because of recorded timestamp from first list execution. - Not really sure what NiFi version you are running, but here are a few Jiras aimed at making list based processors work much better: 1. https://jira.apache.org/jira/browse/NIFI-3332 <-- (Addressed as of Apache NiFi 1.4.0) 2. https://jira.apache.org/jira/browse/NIFI-4069 <-- (Addressed as of Apache NiFi 1.4.0) 3. https://jira.apache.org/jira/browse/NIFI-5157 <-- (Addressed as of Apache NiFi 1.8.0 being released soon) - Thank you, Matt - If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.

MattWho · ‎10-19-2018

@Stephen Greszczyszyn The word "process" can mean many things. What kind of processing are you trying to do? - The content of your syslog data is just standard Ascii, correct? If so, then it can be read by many processors. So thread question is what are you trying to do with it? - I am assuming your syslog ingest may consist of many log lines per FlowFile. If that is the case you may want to "process" these FlowFiles as records. Maybe start by looking at the various "Record" based processors. The GrokReader is probably what you want to configure the record based processors to use in order to parse your syslog content. - Thanks, Matt

MattWho · ‎10-18-2018

@Pepelu Rico In the upcoming Apache NIFi 1.8 release, you may find following new capability will solve your use case issue here: https://jira.apache.org/jira/browse/NIFI-5406 Thanks, Matt

MattWho · ‎10-18-2018

@Pepelu Rico Typically file being transferred to a SFTP server are written using a dot "." filename and then renamed to remove the leading dot "." once transfer has completed. - The ListSFTP processor by default has property named "Ignore Dotted Files" which should be set to "true" so that files with names that start with a dot are ignored and not listed. - While above is typical it is possible that files being written to your SFTP server are not using the standard dot/rename transfer method. Is there some other unique naming/renaming happening to indicate transfer is complete? If so, perhaps you could set up a "File Filter Regex" to avoid listing these files still being transferred. - As long as timestamp on file being written is being updated as it is being written to, the listSFTP processor will list the same file again. The ListSFTP processor creates a FlowFile Attribute named "file.size". You could compare this attribute with the FlowFile content "fileSize" attribute after the FetchSFTP processor. If they do not match, you could discard this FlowFile and wait for next listing of dame FlowFile to arrive where these values match. This option is not ideal because it means fetching content multiple times until complete file is fetched. - Aside from above there really aren't any other options here. - Thank you, Matt - If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.

MattWho · ‎10-17-2018

@pavan srikar yes, that makes sense. When you start the DistirbutedMapCache server is starts a server on each NiFi node.The DIstributedMapCache Client should be configured to point at one specific node, so that every node pulls cache entries from same server. - A little back history: The DistibrutedMapCacheServer and DistirbutedMapCacheClient controller services date back to original NiFi releases versions. Back in those days there was no zero master clustering which we have now. There was a dedicated server that ran a NiFi Cluster Manager (NCM). At that time the DistributedMapCacheServer could only be setup on the NCM. - Once NiFi moved away from having a NCM, the functionality of these controller services was not changed to avoid breaking flows of user who moved to latest versions. The DistirbutedMapCacheServer does not offer HA (if node hosting server goes down, cache becomes unavailable). To provide HA here, new external HA caches options have been added as options. - thanks, Matt

MattWho · ‎10-17-2018

yes, that makes sense. When you start the DistirbutedMapCache server is starts a server on each NiFi node. The DIstributedMapCache Client should be configured to point at one specific node, so that every node pulls cache entries from same server. - A little back history: The DistibrutedMapCacheServer and DistirbutedMapCacheClient controller services date back to original NiFi releases versions. Back in those days there was no zero master clustering which we have now. There was a dedicated server that ran a NiFi Cluster Manager (NCM). At that time the DistributedMapCacheServer could only be setup on the NCM. - Once NiFi moved away from having a NCM, the functionality of these controller services was not changed to avoid breaking flows of user who moved to latest versions. The DistirbutedMapCacheServer does not offer HA (if node hosting server goes down, cache becomes unavailable). To provide HA here, new external HA caches options have been added as options. - thanks, Matt

MattWho · ‎10-17-2018

yes, that makes sense. When you start the DistirbutedMapCache server is starts a server on each NiFi node. The DIstributedMapCache Client should be configured to point at one specific node, so that every node pulls cache entries from same server. - A little back history: The DistibrutedMapCacheServer and DistirbutedMapCacheClient controller services date back to original NiFi releases versions. Back in those days there was no zero master clustering which we have now. There was a dedicated server that ran a NiFi Cluster Manager (NCM). At that time the DistributedMapCacheServer could only be setup on the NCM. - Once NiFi moved away from having a NCM, the functionality of these controller services was not changed to avoid breaking flows of user who moved to latest versions. The DistirbutedMapCacheServer does not offer HA (if node hosting server goes down, cache becomes unavailable). To provide HA here, new external HA caches options have been added as options. - thanks, Matt

MattWho · ‎10-17-2018

@pavan srikar I should add that there is no processor that will specifically clone a FlowFile to every node in the NiFi cluster. - But there are other options if you do not want to standup an external map cache server. - Perhaps setting up a disk mount that is shared across all nodes. On Primary node only you run a flow that retrieves a new token every ~55 minutes writes it to this shared mounted directory set to overwrite previous written token each time. Then on all nodes you could create a flow that consumes this token without deleting it on schedule to perform your all node tasks. - Just a second option for you. - Thank you, Matt

Online	Online
Last Visited	‎02-03-2026 06:11 PM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎02-03-2026 06:11 PM
Posts	3,434
Kudos received	1628

Cloudera Community

Re: Setting TTL per key when writing to redis

Re: Best Practice for configuring registry flows

Re: Nifi 2.7.2 Start Problem

Re: Error importing NiFi workflow template from ve...

Re: nifi 2.6 registry security scan results

Re: Why doesn't my remote groups reconnecting to i...

Re: Apache Nifi - username password for Rest Api

Re: FetchSFTP failed to fetch all file in the list...

Re: Can NiFi route raw packets - like UDP

Re: NiFi - check if the file in sftp is complete

Re: NiFi - check if the file in sftp is complete

Re: Broadcast a Flowfile from primary node to all ...

Re: Broadcast a Flowfile from primary node to all ...

Re: Broadcast a Flowfile from primary node to all ...

Re: Broadcast a Flowfile from primary node to all ...