About MattWho

MattWho · ‎03-08-2021

@pacman In addition to what @ckumar already shared: NiFI purposely leaves components visible to all on the canvas. but unless authorized to view those components, they will display as "ghost" implementations. "Ghosted" components will not show any component names or classes on them. They will only show stats. Unauthorized users will be unable to view or modify the configuration. User will also be unable to list or view data in connections (only see numbers of FlowFiles queued on a connection). The reason NiFi shows these ghosted components is to prevent multiple users from building their dataflows on top of one another. It is very common for users to have multiple teams building their own dataflows, but then also have monitoring teams that may be authorized as "operators" across all dataflows. Or they may have some users that are members of multiple teams. That means these users who can see more, would be left with potentially components layered on top of one another making management very difficult. The stats are there so even if a user can not view or modify a component, they can see where FlowFile backlogs are happening. Since NiFi operates within a single JVM and every dataflow, no matter which user/team built them, is executed as the NiFi service user, everything must share the same system resources (various repos, Heap memory, disk I/O, cpu, etc). These stats provide useful information that one team can use to communicate to another team should resource utilization become an issue. NiFi's authorization model allows users to make very granular access decisions for every component. Authorizations are inherited from the parent process group unless more granular policies are setup on a child component (processor, controller service, input/output port, sub-process group, etc..). Hope this helps, Matt

Pavitran · ‎03-07-2021

Thank you, the first solution was the best one for me

pacman · ‎03-02-2021

@wikulinme where you able to solve this?

MattWho · ‎03-01-2021

@IAMSID I think you are asking two different questions here. In order for the community to help, it would be useful if you gave fo detail around each of your issues. Your example is not clear to me. NOt knowing anything about your source data, what characters are you not expecting? Providing the following always helps: 1. A dataflow template showing what you have done. 2. Sample input file 3. Desired output file base on above sample For query 2, 1. How is data being ingested in to NiFi? 2. Configuration of processor components used to ingest data (ConsumeKafka<version>, ConsumeKafkaRecord<version>, recordWriter, etc...)? 3. What other processors does the FlowFile pass through in this dataflow (flow template)? Thanks, Matt

MattWho · ‎02-11-2021

@adhishankarit When moving on to a new issue, I recommend always starting a new query for better visibility. (for example, someone else in the community may have more experience with new issue then me). As far as your new query, your screenshots do not show any stats on the processor to get an idea of what we are talking about here in terms of performance. How many fragments are getting merged? How large are each of these fragments? NiFi nodes are only aware and have access to the FlowFiles on the individual node. So if node a is "out" (not sure what that means), any FlowFiles still on node "a" that are part of the same fragment will not yet be transferred to node b or c to get binned for merge. The Bin can not be merged until all fragments are present on the same node. Since you mention that bin eventually happens after 10 minutes, tells me that eventually all fragments eventually make it on to the same node. I suggest the first thing to address here is your space issue on your nodes. Also keep in mind that while you have noticed that node "a" has always been your elected primary node, there is no guarantee that will always be the case. A new Cluster Coordinator and Primary node can be elected by Zookeeper at anytime. If you shutdown or disconnect currently elected primary node "a" you should see another node get elected as primary node. Adding node "a" back in will not force ZK to elect it back as primary node. So don't build your flow around a dependency on any specific node being primary node all the time. Matt

bsivalingam83 · ‎02-10-2021

Thanks @MattWho. Got it !!

MattWho · ‎02-10-2021

@has The listFile processor does not accept an inbound connection. If you know the filename of the file being created and the path where that file is created, all you need is the FetchFile processor. handleHTTPRequest ---> <flow to execute jar> --> updateAttribute (set path and filename attributes) --> FetchFile --> handleHttpRequest --> <rest of dataflow> The handleHttpRequest processor does not return content. It simply returns a response code back to for the original request which has not yet been responded to. The only thing you have control over in the response is the status code sent and sending custom headers in the response. Hope this helps, Matt

MattWho · ‎02-10-2021

@Jarinek The process really depends on what update you are trying to make. 1. You can not remove a connection that has queued FlowFiles in it, but you can redirect it to a different target processor with queued data. 2. You can not redirect a connection if the processor it is currently attached to still has a running thread. Stopping a processor does not kill threads, it simply tells the processor to not execute again at the configured run schedule. Existing threads will continue to run until they complete. Until all threads exit, the processor is still in a state of "stopping" even though UI reflect red square for "stopped". 3. You cannot modify a processor if is still has running threads (see note about "stopping" processors above) 4. If you stop the component that is on the receiving side of a connection, any FlowFiles queued on that connection, not tied to any active thread still running on target processor component, will not be processed and remain queued on the connection. You can manual empty a queue through a rest-api call (means data loss), but that is not necessary if you are not deleting the connection. Attempts to perform configuration changes when components still have active threads or are in a running state will result in an exception being thrown and the change not happening. Attempts remove connections that have queued FlowFiles will throw an exception and block removal. Now if all you are trying to do is modify some configuration on a processor, all you need to do is stop the processor, check that it has no active threads, make the config change, and then start the processor again. Not sure wha you are asking with "update the flow ignoring any data in failure or error connection queues". NiFi does not ignore queued FlowFiles. It also not wise to leave connection with queued FlowFiles just sitting around your dataflows. Those old queued FlowFile will prevent removal or content claims that contain that FlowFiles data. Since a content claim can contain the data from 1 to many FlowFiles, this can result in your content repository filling up. NiFi can only remove content claims which have no FlowFiles pointing to them anymore. Here are some useful links: https://nipyapi.readthedocs.io/en/latest/nipyapi-docs/nipyapi.html https://github.com/Chaffelson/nipyapi http://nifi.apache.org/docs/nifi-docs/rest-api/index.html https://community.cloudera.com/t5/Community-Articles/Update-NiFi-Connection-Destination-via-REST-API/ta-p/244211 https://community.cloudera.com/t5/Community-Articles/Change-NiFi-Flow-Using-Rest-API-Part-1/ta-p/244631 Hope this helps, Matt

MattWho · ‎02-09-2021

@medloh That is the correct solution here, the filename is always stored in a FlowFile attribute named "filename". Using the updateAttribute processor is the easiest way to manipulate the FlowFile attribute. You can use other attributes, static text, and even subjectless functions like "now()" or "nextInt()" to create dynamic filenames for each FlowFile. https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html Hope this helps, Matt

MattWho · ‎02-09-2021

@Umakanth The GetSFTP processor actually creates a verbose listing of all Files form target SFTP for which it will be getting. It then fetches all those files. Unlike the ListSFTP processor, the getSFTP is an older deprecated processor that does not store state. My guess here is that at times the listing is larger then other times or as you mentioned some occasional latency occurs resulting in enough time between creating that list and actually consuming the files, that the source system has moved the listed file before it is grabbed. In that case moving to the newer ListSFTP and FetchSFTP processors will help in handling that scenario. The listing will list all the files it sees and the FetchSFTP will fetch the content for those that have not yet been moved by the source system. The FetchSFTP will still throw an exception for each file it can not find still and route those to the not.found relationship which you can handle programmatically in your NiFi dataflow(s). Thanks, Matt

Online	Online
Last Visited	‎02-06-2026 09:45 AM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎02-06-2026 09:45 AM
Posts	3,436
Kudos received	1628

Cloudera Community

Re: Setting TTL per key when writing to redis

Re: Best Practice for configuring registry flows

Re: Nifi 2.7.2 Start Problem

Re: Error importing NiFi workflow template from ve...

Re: nifi 2.6 registry security scan results

Re: How to achieve true multi tenancy in nifi?

Re: How to get files from latest directory based o...

Re: NIFI Unable to locate initial admin CN=admin,O...

Re: Replace text processor: search and replace mul...

Re: Nifi Processors status merge attributes into s...

Re: How to comment a property in nifi.properties o...

Re: how to user nifi HadleHttpRequest and HandleHt...

Re: Automatic deployment of NiFi Flow

Re: Nifi PutParquet processor, how to enter json s...

Re: NiFi 1.9.1 GetSFTP throws FlowFileAccessExcept...