About MattWho

MattWho · ‎11-07-2018

@narasimha chembolu - The ListAzureBlobStorage processor is designed to produce a FlowFile for each blob listed from target storage. For each produced FlowFile produced the following attributes are written to the FlowFile: - The FetchAzureBlobStorage processor is triggered to execute by each incoming FlowFile that it receives. It is designed by default to use the value that was assigned to that FlowFile attribute "azure.blobname" by the listAzureBlobStorage to determine which blob to return. - In you case you are only looking to actually fetch a very specific blob, so you configured "Blob' property in the FetchAzure processor to always get a very specific blob. This means that every incoming FlowFile is going to fetch the content of the same blob each time an insert it in to the content of every listed FlowFile. - So your flow is working as designed, but not as you intended. - You have two options: 1. reconfigure your FetchAzureBlobStorage processor to use "${azure.blobname}" in the blob property. Then add a routeOnAttribute processor between the listAzureBlobStorage and FetchAzureBlobStorage processor to filter on the specific blob name you are looking for so that only that listed file makes it to the FetchAzure processor. - 2. Don't use ListAzureStorage processor at all. Instead use a GenerateFlowFile processor to generate single 0 byte FlowFile on the primary node and use it to trigger the FetchAzureStorage processor to fetch the specific blob you want. - Thank you, Matt - If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.

MattWho · ‎10-24-2018

@Willian Gosse - During a NiFi restart, the flow is loaded and started before the NiFi UI is made available. During this period of time the Remote Process Groups (RPG) on each node will fail to be able to connect to the configured target NiFi URL to fetch the Site-To-Site (S2S) details. This is expected behavior. The RPGs will stop throwing this error in the logs once the configured target NiFi URL is made available and the S2S details are successfully retrieved. - The use of HTTP or RAW as the transport protocol controls how the actual FlowFiles are transferred. The re-occurring connection to retrieve the S2S details will always be over http to the target NiFi URL configured in the RPG. When using HTTP transport protocol. the NiFi FlowFiles will also be transferred via the same HTTP port as the Target NiFi UI is exposed on. Setting transport protocol to use RAW causes the RPG to use a dedicated socket port for the FlowFile transfer. The socket port used is set by the target NiFi servers in the nifi.properties file (property: nifi.remote.input.socket.port=). The advantage to using RAW is that amount of traffic going to HTTP port used to access UI is reduced considerably. The advantage to using HTTP is that you have one less port you must open through any firewalls to the NiFi nodes. - Thank you, Matt - If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.

stgreszc · ‎10-24-2018

I'm starting to trial NiFi as a data ingestion engine. I would like to input the following datatypes: 1) collectd (UDP): I don't think that NiFi has a collectd parser, so I will need to direct these raw UDP streams to a locally running Telegraf and Logstash instance for parsing. 2) Syslog (UDP): I would like to experiment with routing raw syslog UDP packets (unprocessed) to destinations as well as filtering/parsing the syslog data using the NiFi syslog modules. 3) Netflow (UDP): I would like to take a heavy raw Netflow stream and test performance to forward only a subset of Netflow data based on a list of protocol types that I'm interested in (mapped against one of the Netflow data key values). What I haven't been able to understand in the documentation is how I can redirect raw UDP packet flows, (listening and then forwarding to two destinations), without having to process the particular data in the UDP packet.

MattWho · ‎10-12-2018

@David Sargrad - NiFi is designed to prevent data loss. This means that NiFi needs to do something with NiFi FlowFiles when the processing of that FlowFile encounters a failure somewhere within a dataflow. - When in comes to ingest type processors like GetHTTP, a FlowFile is only generated upon success. As such, there is no FlowFile created during failure that would need to be handled/routed to some failure relationship. - Upon next scheduled run, the getHTTP processor will simply try to execute just like it did on previous run. If successful, a FlowFile will be produced and routed to the outbound success relationship connection. - Thank you, Matt - If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.

dave_sargrad · ‎10-12-2018

Thank you. I like your answer very much. I do think the referenced example was not focused on a zip of zip (just a simple zip of a directory tree).. Yet I think your answer is proper.. The "path" attribute does the job. I'll try this.. and thanks.

dave_sargrad · ‎10-15-2018

@Matt Clarke Hi Matt. One question I'd like to get your perspective on.. Assuming that I manage independent flows within a single NIFI cluster, is it your experience that I can use the NIFI registry to properly version and manage the independent flows that are processed within that cluster? The inability to manage each independent flow as a versioned flow, could potentially drive me to using multiple NIFI clusters (and to assign each cluster one or several flows). I'm concerned about the overall complexity of the processing that is assigned to a single cluster.

MattWho · ‎10-01-2018

@yazeed salem Your NiFi expression Language statement looks good. I even tested base on your example and it routed my flowfiles correctly Make sure that each of the FlowFiles being processed have required FlowFile Attributes set on them. You can stop your RouteOnAttribute processor and allow a few files to queue in the connection feeding it. Then right click on connection and select "List queue". You can then click on "details" icon to far left of any FlowFIle to verify that it does have correct attributes set on it. - - Thank you, Matt - If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.

jricogar · ‎09-27-2018

Thank you very much I have to solve it 🙂

MattWho · ‎09-18-2018

@Wojtek I believe you are misunderstanding how the UpdateAttribute processor functions. - Each new property you add expects a property name (this becomes name of attribute being created or updated) and a value. The value can be a string or a NiFi Expression Language (EL) statement. - In your screenshot above you have created EL statements. For example Property = bytes Value = ${bytes} What the above will actually do is: - The EL statement "${bytes}" tells NiFi to try to locate an NiFi attribute (At no time does the updateAttribute processor read the content of the FlowFile) with a property name of bytes and return its assigned value. That returned value will then be used to change the existing value assigned to the FlowFile attribute "bytes" or create a new FlowFile attribute with property name "bytes" and assign the value to that. - NiFi searches for NiFi Attributes in the following hierarchy: 1. NiFi checks all existing attributes assigned already to FlowFile being processed. 2. NiFi checks all in scope process group variables 3. NiFi checks the NIFi variable registry file 4. NiFi checks the NiFi JVM properties 5. NiFi checks the NIFi user system environment variables. - Since the attribute "bytes" does not exsit in any of these places you are ending up with no value or empty string values being set on all these new properties. - Since you are trying to extract values from the content and assign those to FlowFile attributes, you will want to use a different processor. Perhaps ExtractText instead. - Thank you, Matt - If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.

MattWho · ‎09-12-2018

# Max Timer Driven Thread Count and Max Event Driven Thread Count: - Out of the box NiFi sets the Max Timer Thread Counts relatively low to support operating on the simplest of hardware. This default setting can limit the performance of very large high volume dataflow that must perform a lot of concurrent processing. General guidance for setting this value is 2 - 4 times the number of cores available to the hardware on which the NiFi service is running. With a NiFi cluster where each server has different hardware (not recommended), this would be set to the highest value possible based on the server with the fewest cores. NOTE: Remember that all configurations you apply within the NIFi UI are applied to every node in a NiFi cluster. None of the settings apply as a total to the cluster itself. NOTE: The cluster UI can be used to see how the total active threads are being used per node. Closely monitoring system CPU usage over time on each of your cluster nodes will help you identify regular or routine spikes in usage. This information will help you identify if you can increase the “Maximum Timer Driven Thread Count” setting even higher. Just arbitrarily setting this value higher can lead to thread spending excessive time in CPU wait and not really doing any work. This can show as long tasks times reported in processors. - *** The Event Driven scheduling strategy is considered experimental and thus do not recommend that it is used at all. User should only be configuring their NiFi processors to use one of the Timer Driven scheduling strategies (Timer Driven or CRON Driven). - # Assigning Concurrent Tasks to processor components: - Concurrent task settings on processors should always start at default 1 and only be incremented slowly as needed. Assigning too many concurrent tasks to every processor can have an affect on other dataflows/processors. - Because of how the above works, it may appear to a user that they get better performance out of a processor by simply setting a high number of concurrent tasks. What they are really doing is simply stacking up more request in that large queue so the processor gets more opportunity to grab one of the available threads from the resource pool. What often happens is users with a processor only running with 1 concurrent task are affected (simply because of the size of the request queue). So that user increases their concurrent tasks. Before you know it the request queue is so large, no one is benefiting from assigning additional concurrent tasks. - In addition you may have processors that inherently have long running tasks. Assigning these processors lots of concurrent tasks can mean a substantial chunk of that thread pool is being used for an extended amount of time. This then limits the number of available threads from the pool that are trying to work through the remaining tasks in the queue.

Online	Online
Last Visited	‎02-08-2026 09:45 AM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎02-08-2026 09:45 AM
Posts	3,436
Kudos received	1629

Cloudera Community

Re: Setting TTL per key when writing to redis

Re: Best Practice for configuring registry flows

Re: Nifi 2.7.2 Start Problem

Re: Error importing NiFi workflow template from ve...

Re: nifi 2.6 registry security scan results

Re: Fetching file using FetchAzureBlobStorage Nifi...

Re: Why doesn't my remote groups reconnecting to i...

Re: Can NiFi route raw packets - like UDP

Re: Why does the NIFI GetHTTP processor only have ...

Re: How to define a NIFI processor that will unzip...

Re: NIFI Architectural Approach - Independent Flow...

Re: NiFi Expression - Multiple AND - AND Condition...

Re: NIFI - save file with date of file tittle

Re: Problem with assigning values to attributes in...

Understanding NiFi max thread pools and processor ...