About MattWho

MattWho · ‎05-03-2019

@Ramesh Reddy When you build a dataflow on your NiFi UI's canvas and then save it as a template via "Create Template", all selected components (including all current configuration minus any passwords) will be saved into a template. The template is a just an xml snippet of that current configuration of those components which is kept in the NiFi's flow.xml.gz file. From the the "Templates" UI found under NiFi's global menu (upper right corner of NiFi UI) you can select any template and download that xml snippet to disk. Those snippets can then be uploaded to another NiFi for use there. The actual components on the canvas are in no way linked directly to any created template. So when you make changes to the components on the canvas, those changes are not pushed to any template you already created. You can create a new template anytime you want after making changes. I believe what you really want to be using is the NiFi-Registry service. This is a separate install from NiFi. NiFi can be configured to use the NiFi-Registry service to "Version Control" dataflows you build within the NiFi UI's canvas. Once a NiFi registry has been configured for your NiFi to use, you will have the option to right click on a NiFi Process Group (PG) and select "Start version control" from the version found in the presented context menu that is displayed. Once a PG is placed under version control, any changes made within the PG will trigger the PG to display an icon in upper left corner: informing you that local changes have been made. Right clicking on the PG again will allow you to commit those local changes as a new "version" of that dataflow in the NiFi-Registry. Other NiFi installs can be configured to use that same NiFi-Registry which allows those other NiFi instances to load a version controlled PG on to the canvas on that NiFi as well. A version controlled PG where the local flow matches teh current version in teh NiFi_Registry will display this icon in upper left corner of the PG: If a new version of a version controlled PG is pushed to the NiFi-Registry, any other NiFi instances using that version controlled flow will be notified that a newer version is available in the NiFi-Registry. User can then upgrade their PG to teh newer version. Thank you, Matt

MattWho · ‎05-03-2019

@c i I am not clear from your description what your flow is doing. - What does the original FlowFile content look like before these replaceText processors? - How are the replaceText processors configured? The description you have of how the mergeContent processor works is not accurate. The MergeContent processor will merge a bin when any ONE of the following as occurred: 1. A given bin meets the configured minimum configured values for "number of entries" and "group size" (Bin-Packing Algorithm Merge Strategy) 2. A given bin contains all fragments of a fragmented FlowFile (Defragment Merge Strategy) 3. A bin has reached the configured "max bin age". Bin age starts when first FlowFile is allocated to a bin. The max bing age property exists to keep bins from staying around forever when the conditions outlined in 1 and 2 above are never met. 4. Processor runs out of bins. If FlowFiles have been allocated to each available bin already and a new FlowFile does not meet criteria to be addd to one of these existing bins, the oldest bin will be merged in order to free a bin to use for this new FlowFile. (This typically occurs when using the Correlation Attribute Name property and the source FlowFiles have more unique attribute values than bins allocated). The MergeContent processor will not always wait the configured max bin age before merging. Also keep in mind that with a NiFi cluster, each NiFi node is running the MergeContent processor independently of the other nodes and can only merge FlowFiles on the same node. So while an inbound connection queue to the MergeContent processor may show 2 queued FlowFiles, each of those FlowFiles may exists on different nodes and thus would not be merged together in to one FlowFile. Consider the "Max bin age" as the max latency you are willing to allow in your flow. So that a bin is merged even if it does not meet min criteria in the allotted time. So even if you set this to 30 minutes, a bin that meets the min criteria in 2 mins will get merged at 2 minutes. More details about your overall dataflow is probably needed to offer alternative suggestions here. Thank you, Matt

MattWho · ‎05-02-2019

@Nick Stantzos You can ignore the unexpected coloring rendered by the NiFi Expression language editor window. The java regex replace should work just using "# DOG CAT BIRD" or "\# DOG CAT BIRD". Thank you, Matt

MattWho · ‎04-11-2019

@Abhinav Joshi I suspect that you still have multiple versions of the updateAttribute loaded in your new NiFi 1.9.0 install. My guess here is that the updateAttribute-1.4.nar was added to your NIFi 1.8 and NiFi 1.9 as an additional nar manually. Check your nifi.properties file in NiFi 1.9.0 for the every occurrence of "nifi.nar.library.directory". You may find 1 or more. These will be the locations from which your NiFi is loading its nars. Then look in these directories for "nifi-update-attribute-nar" to see if you find multiple versions in use. I am guessing that in your NiFi 1.9.0 you will find a 1.9.0 version of this nar and a 1.4.0 version. - So what happens when the flow.xml.gz file you are using from NiFi 1.8.0 is loaded, you have multiple versions of the updateAttribute processor (1.4 and 1.8). The 1.4 versions load just fine because the 1.4.0 nar still exists; however, there is no 1.8.0 version of the nar. Normally NiFi would auto select the available option, but since there are two options available (1.4 and 1.9 now), NiFi cannot auto select because two options exist and ghost implementation of the 1.8 versions are created requiring the user to manually replace them with the desired version. - So you have two options: 1. Remove the 1.4 nar so only 1.9 version of the nar exists 2. Manually edit the flow.xml.gz file replacing the version number on the 1.8 updateAttribute processors with the new 1.9 version number. - If you did not find multiple nars, then perhaps you are new 1.9 install is using the same old work directory as the original 1.8. NiFi unpacks the nars in the work directory on startup. IN this scenario simply delete everything in the work directory before starting NiFi so it is rebuild form the nars/jars found in the above mention library directories. - Thanks, Matt

MattWho · ‎04-10-2019

@Julian Iglesias - You can accomplish above using a function chain in your NiFi Expression Language (EL) statement. Rather then using the getDelimitedField() EL function: ${log_source:getDelimitedField(3,'\')} - You can successfully do this using the substringAfter() function twice to strip away what is before "dir2" and the substringBefore() function to strip away everything after "dir2": ${log_source:substringAfter('\\'):substringAfter('\\'):substringBefore('\\')} - Thank you, Matt - If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.

MattWho · ‎04-10-2019

@Samar Aarkotti *** Community Forum Tip: Try to avoid starting a new answer in response to an existing answer. Instead use comments to respond to existing answers. There is no guaranteed order to different answer which can make it hard following a discussion. It always best to leave your processor at default value for concurrent task unless there is a specific need to increment. Here is an article on this topic: https://community.hortonworks.com/articles/221808/understanding-nifi-max-thread-pools-and-processor.html and another on "Run Duration": https://community.hortonworks.com/articles/221807/understanding-nifi-processors-run-duration-functio.html

MattWho · ‎04-10-2019

@Samar Aarkotti The exception you are seeing can be expected because of the concurrent execution you have going on per node. With 2 concurrent tasks, you have the processor potentially executing its code in twice in parallel resulting in on thread updating what is in state before the other thread does. - In the event that the UpdateAttribute processor is unable to get the state at the beginning of the onTrigger, the FlowFile will be pushed back to the originating relationship and the processor will yield. If the processor is able to get the state at the beginning of the onTrigger but unable to set the state after adding attributes to the FlowFile, the FlowFile will be transferred to "set state fail". This is normally due to the state not being the most up to date version (another thread has replaced the state with another version). In most use-cases this relationship should loop back to the processor since the only affected attributes will be overwritten. - I would suggest when using state in the updateAttribute processor that you configure the processor to with only 1 concurrent task. Keep in mind that the processor settings are per node so each node in you cluster will still be executing this processor. - If throughput is not meeting your needs, make sure you have properly load-balanced the source FlowFile across all nodes in your cluster. If you are and throughput is still an issue, try adjusting the "Run Duration" in very small increments and still leave concurrent tasks to 1. - Thank you, Matt - If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.

MattWho · ‎04-10-2019

@Kevin Lahey 1. Each NiFi node in a cluster runs its own copy of the flow.xml and processes its own set of FlowFiles. Node are unaware of what FlowFiles exist on other nodes in the cluster. 2. In much older versions of NiFi (Apache 0.x versions), NiFi did not have any High availability at the control level within a cluster. There existed a dedicated NiFi instance known as the NiFi Cluster Manager (NCM). This was the only instance in the NiFi cluster that could be accessed. All the nodes connected to this NCM. If NCM went down the entire NiFi cluster was not reachable. As of Apache NiFi 1.x+ the NCM no longer exists and the cluster relies on Zookeeper to elect one of the cluster nodes to handle role of Cluster Coordinator and Primary node. If the currently elected node(s) for these roles goes down, a new load is elected to these roles. In this way HA at the control level was provided. When you create any component (processor, controller service, reporting task, etc...), those components are replicated to all nodes in the cluster. So yes, the DistributedMapCacheServer controller service would be running on all nodes. If you then configured the DistributedMapCacheClient to use "localhost", then each node would be reading and writing to different cache servers. The DistributedMapCacheClient should be configured to point at a specific node rather than localhost. As you can see you have no HA in this type of setup since you are dependent on that one node hosting the cache server you are using to always be up. Instead you shoudl be using one of the external cache options like HBase in order to have HA. 3. As explained above, there is not such thing as a NCM as of Apache NiFi 1.x+ 4. Every component you add to the NiFi canvas is running within a single JVM on each NiFi node. So you cannot configure multiple components that bind to the same configured port anywhere. The first component will bind to port and when the other components are started they will throw an exception about port already in use. You can have as many clients (DistributedMapCache Client) as you like, since they act as a client and do not bind to a port. Only the server binds to the port so it can listen for client requests. Hope this helps

MattWho · ‎04-09-2019

@Abhinav Joshi *** Community Forum Tip: Try to avoid starting a new answer in response to an existing answer. Instead use comments to respond to existing answers. There is no guaranteed order to different answer which can make it hard following a discussion. I would suggest searching the nifi/work directory for multiple versions of the update-attribute nar bundle. You may have multiple nars of different versions installed. The flow.xml.gz file does contain the specific processor version for each component. When starting NiFi 1.9 using the flow.xml.gz from another NiFi version, the component versions will automatically be updated to the new version only if a single option exists. If you have an updateAttribute-1.8.<custom> and an updateAttribute-1.9.0 version available and the flow.xml.gz has an updateAttribute-1.8.0 then it will not auto-update because there are two options and it does not know which should be used. - My guess here is that your NiFi 1.8.0 contained both the standard 1.8.0 version of the the updateAttribute processor and a custom version of the updateAttribute processor. Then your flow contained updateAttribute components of each, Then you upgraded to NiFi 1.9.0 which replaced the stock updateAttribute with 1.9.0 and the custom version of Update Attribute processor was also carried over to your NiFi 1.9.0 install. - Thanks, Matt

MattWho · ‎04-09-2019

@Kevin Lahey I completely agree with @Shu. I sounds like you have ListS3 processor executing on all 4 nodes in a NiFi cluster. This results in each NiDi node listing the same filename. This means that each node is then trying to lookup that filename in the distributed cache used by the detectDuplicate processor. This results in a bit of a race condition between you nodes where one or more nodes fails to find entry in cache before 1 of the nodes adds this new filename to that cache. - You flow should be running the ListS3 processor with it success relationship feeding a FetchS3 processor. That connection between those two processors should be configured to load balance the listed files across all nodes in cluster. - Thanks, Matt

Online	Offline
Last Visited	‎07-11-2026 01:36 AM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎07-11-2026 01:36 AM
Posts	3,472
Kudos received	1638

Cloudera Community

Re: ListenNetFlow processor does not decode Cisco ...

Re: Can we detect who did a particular operation i...

Re: How to invoke a url in nifi which is protected...

Re: Retry impacts scheduler

Re: 503 error while copying/versioning big process...

Re: How to save modified NIFI template

Re: What are some strategies to merge the content ...

Re: Regex # Special Character Escape

Re: Update Attribute processor Not working after u...

Re: getDelimitedField when delimiter is a \

Re: UpdateAttribute Warning.

Re: UpdateAttribute Warning.

Re: Why is DetectDuplicate not filtering duplicate...

Re: Update Attribute processor Not working after u...

Re: Why is DetectDuplicate not filtering duplicate...