About MattWho

MattWho · ‎10-29-2018

@Bobby Harsono - Some processor may be designed to utilize memory outside of the JVM. Some of the scripting processor like ExecuteProcess or ExecuteStreamCommand are a good examples. They are calling a process or script external to NiFi. Those externally executed commands will have a memory footprint of their own. - Listen type processors like ListenTCP or ListenUDP is another example. These have memory footprints both inside and outside the NiFi JVM heap space. These processors can be configured with socket buffer which is created outside of heap space.- - Thanks, Matt

MattWho · ‎10-29-2018

@naveen - I would recommend getting several thread dumps from NiFi when in this situation to see what is causing your current threads to stale. This can be achieved using the <path to NiFi>/bin/nifi.sh script as follows: - ./nifi.sh dump <name of dump file> - Some other things to try: 1. Under heavy volume the default NiFi provenance implementation (org.apache.nifi.provenance.PersistentProvenanceRepository) may not be able to keep up. If NiFi is is waiting on Provenance, all flows will appear to be stalled. Make sure you are instead using the new org.apache.nifi.provenance.WriteAheadProvenanceRepository implementation which was redesigned to be much more performant. 2. Make sure you do not have constant Garbage collection occurring. Even minor/young GC is a stop-the-world event. It is possible that after some time of running and ingesting data, GC gets in to a non stop cycle of trying to free heap memory space. 3. Have you changed the default Max Timer Driven Thread count settings under "controller settings" in the Global menu in upper right corner of UI. Default is only 10. 4. Avoid configuring any of your processors to use the Event Driven scheduling strategy. - Thank you, Matt

MattWho · ‎10-24-2018

@Willian Gosse - During a NiFi restart, the flow is loaded and started before the NiFi UI is made available. During this period of time the Remote Process Groups (RPG) on each node will fail to be able to connect to the configured target NiFi URL to fetch the Site-To-Site (S2S) details. This is expected behavior. The RPGs will stop throwing this error in the logs once the configured target NiFi URL is made available and the S2S details are successfully retrieved. - The use of HTTP or RAW as the transport protocol controls how the actual FlowFiles are transferred. The re-occurring connection to retrieve the S2S details will always be over http to the target NiFi URL configured in the RPG. When using HTTP transport protocol. the NiFi FlowFiles will also be transferred via the same HTTP port as the Target NiFi UI is exposed on. Setting transport protocol to use RAW causes the RPG to use a dedicated socket port for the FlowFile transfer. The socket port used is set by the target NiFi servers in the nifi.properties file (property: nifi.remote.input.socket.port=). The advantage to using RAW is that amount of traffic going to HTTP port used to access UI is reduced considerably. The advantage to using HTTP is that you have one less port you must open through any firewalls to the NiFi nodes. - Thank you, Matt - If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.

MattWho · ‎10-19-2018

@Andy Gisbo Yes, that guide is accurate example of using OpenID with google.

MattWho · ‎10-19-2018

@Emma Ixiato - The list based processors rely on File timestamps to determine if a file should be listed or not. This means that the list based processors may not list files in the target location if: - 1. New files added to source location do not have their timestamp updated. (Thus last recorded timestamp in NiFi from previous listing is newer the age of file that was added) 2. Multiple files are being written to source location at same time and the list based processor did not list all of them in my execution. Second execution would miss other files because of recorded timestamp from first list execution. - Not really sure what NiFi version you are running, but here are a few Jiras aimed at making list based processors work much better: 1. https://jira.apache.org/jira/browse/NIFI-3332 <-- (Addressed as of Apache NiFi 1.4.0) 2. https://jira.apache.org/jira/browse/NIFI-4069 <-- (Addressed as of Apache NiFi 1.4.0) 3. https://jira.apache.org/jira/browse/NIFI-5157 <-- (Addressed as of Apache NiFi 1.8.0 being released soon) - Thank you, Matt - If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.

MattWho · ‎10-19-2018

@Stephen Greszczyszyn The word "process" can mean many things. What kind of processing are you trying to do? - The content of your syslog data is just standard Ascii, correct? If so, then it can be read by many processors. So thread question is what are you trying to do with it? - I am assuming your syslog ingest may consist of many log lines per FlowFile. If that is the case you may want to "process" these FlowFiles as records. Maybe start by looking at the various "Record" based processors. The GrokReader is probably what you want to configure the record based processors to use in order to parse your syslog content. - Thanks, Matt

MattWho · ‎10-18-2018

@Pepelu Rico In the upcoming Apache NIFi 1.8 release, you may find following new capability will solve your use case issue here: https://jira.apache.org/jira/browse/NIFI-5406 Thanks, Matt

MattWho · ‎10-18-2018

@Pepelu Rico Typically file being transferred to a SFTP server are written using a dot "." filename and then renamed to remove the leading dot "." once transfer has completed. - The ListSFTP processor by default has property named "Ignore Dotted Files" which should be set to "true" so that files with names that start with a dot are ignored and not listed. - While above is typical it is possible that files being written to your SFTP server are not using the standard dot/rename transfer method. Is there some other unique naming/renaming happening to indicate transfer is complete? If so, perhaps you could set up a "File Filter Regex" to avoid listing these files still being transferred. - As long as timestamp on file being written is being updated as it is being written to, the listSFTP processor will list the same file again. The ListSFTP processor creates a FlowFile Attribute named "file.size". You could compare this attribute with the FlowFile content "fileSize" attribute after the FetchSFTP processor. If they do not match, you could discard this FlowFile and wait for next listing of dame FlowFile to arrive where these values match. This option is not ideal because it means fetching content multiple times until complete file is fetched. - Aside from above there really aren't any other options here. - Thank you, Matt - If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.

MattWho · ‎10-17-2018

@pavan srikar yes, that makes sense. When you start the DistirbutedMapCache server is starts a server on each NiFi node.The DIstributedMapCache Client should be configured to point at one specific node, so that every node pulls cache entries from same server. - A little back history: The DistibrutedMapCacheServer and DistirbutedMapCacheClient controller services date back to original NiFi releases versions. Back in those days there was no zero master clustering which we have now. There was a dedicated server that ran a NiFi Cluster Manager (NCM). At that time the DistributedMapCacheServer could only be setup on the NCM. - Once NiFi moved away from having a NCM, the functionality of these controller services was not changed to avoid breaking flows of user who moved to latest versions. The DistirbutedMapCacheServer does not offer HA (if node hosting server goes down, cache becomes unavailable). To provide HA here, new external HA caches options have been added as options. - thanks, Matt

MattWho · ‎10-17-2018

yes, that makes sense. When you start the DistirbutedMapCache server is starts a server on each NiFi node. The DIstributedMapCache Client should be configured to point at one specific node, so that every node pulls cache entries from same server. - A little back history: The DistibrutedMapCacheServer and DistirbutedMapCacheClient controller services date back to original NiFi releases versions. Back in those days there was no zero master clustering which we have now. There was a dedicated server that ran a NiFi Cluster Manager (NCM). At that time the DistributedMapCacheServer could only be setup on the NCM. - Once NiFi moved away from having a NCM, the functionality of these controller services was not changed to avoid breaking flows of user who moved to latest versions. The DistirbutedMapCacheServer does not offer HA (if node hosting server goes down, cache becomes unavailable). To provide HA here, new external HA caches options have been added as options. - thanks, Matt

Online	Offline
Last Visited	‎12-22-2025 03:09 PM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎12-22-2025 03:09 PM
Posts	3,406
Kudos received	1618

Cloudera Community

Re: Error importing NiFi workflow template from ve...

Re: Error importing NiFi workflow template from ve...

Re: How to elevate a default nifi user to admin - ...

Re: NiFi EnvokeHTTP - putting current date on HTTP...

Re: Invoking Nifi rest api in Data Flow

Re: Best NiFi Heap usage performance for Large Ser...

Re: Nifi Processors shows running but new flow fil...

Re: Why doesn't my remote groups reconnecting to i...

Re: Apache Nifi - username password for Rest Api

Re: FetchSFTP failed to fetch all file in the list...

Re: Can NiFi route raw packets - like UDP

Re: NiFi - check if the file in sftp is complete

Re: NiFi - check if the file in sftp is complete

Re: Broadcast a Flowfile from primary node to all ...

Re: Broadcast a Flowfile from primary node to all ...