About MattWho

MattWho · ‎01-27-2025

@vystar Considering the breaking changes that are part of Apache NiFi 2.0/1, there is considerably more work in preparing for an upgrade to the that new major release. So I would recommend upgrading to the latest offering in the Apache NiFi 1.x branch. You'll want to review all the release notes from 1.13 to the latest release: https://cwiki.apache.org/confluence/display/NIFI/Release+Notes/#ReleaseNotes-Version1.12.0 You'll want to pay close attention to any mentions of components being moved to optional build profiles. This means that these nars and the components they contain are no longer include with the Apache NiFi download and if needed must be downloaded form Maven Central and manually added to NiFi. Deprecated components still exist in the Download, but will not exist in NiFi 2.x releases. Make sure to maintain a copy of your flow.xml.gz/flow.json.gz (newer releases). The newer Apache NiFi 1.x load a flow.json.gz instead of the older flow.xml.gz on startup. However, in the absence of a flow.json.gz and the presence of flow.xml.gz, NiFi 1.x will load from the flow.xml.gz and produce the new flow.json.gz. After upgrade, you'll still need to review your dataflows. There are some bad practices that are now blocked by Apache NiFi that may leave some components invalid until manual action is taken to resolve the bad configuration (such as using "primary node" execution on any processor that has an inbound connection). As far as FlowFile distribution, do it early in your dataflows as possible. Utilize list/fetch still processors instead of get style. (Example: use ListSFTP and FetchSFTP in place of GetSFTP. This allows you to load-balance the 0 byte listed files before the content is fetched for the files). Other options like Remote Process Groups can be used (they come with some overhead, but do some target NiFi Cluster load based distribution when dealing with large volumes of FlowFiles. Not so great for low volumes.). Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎01-23-2025

@spartakus You will want to review the release notes for all release between 1.23. and 1.28.1 https://cwiki.apache.org/confluence/display/NIFI/Release+Notes/#ReleaseNotes-Version1.23.0 You'll notice a number of components deprecated. Deprecated components are not removed, but that are marked as deprecated and will log if used that they are deprecated. Deprecated components will eventually be removed form Apache NiFi (most removed in NiFi 2.0). You'll also make note of any items in the release notes that state: Moved <XYZ> to optional build profiles This means that these components were removed from Apache NiFi. These components can be added manually later by downloading the nars from the central maven repository and adding them to the NiFi lib directory if you need them still. https://mvnrepository.com/artifact/org.apache.nifi Make a back up of your flow.json.gz, nifi.properties, and optionally flow.xml.gz files before you upgrade. Otherwise, I see no issues with upgrading from 1.22 -- > 1.28 directly. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

VidyaSargur · ‎01-23-2025

@itninja, Did the response help resolve your query? If it did, kindly mark the relevant reply as the solution, as it will aid others in locating the answer more easily in the future. However, if you still have concerns, could you please provide the information that @MattWho has requested?

MattWho · ‎01-21-2025

@jirungaray Cloudera Flow Management (Based on Apache NiFi) provides multiple methods for managing user authorization. This includes NiFi internally via the File-Access-Policy-Provider and externally via Apache Ranger. There is no built in mechanism for auto setting up authorization policies for users or groups with the exception of the Initial Admin and Initial NiFi Node authorizations. Many of the Authorization policies are directly related to the components added to the canvas. Those components are assigned unique IDs making it impossible to create policies before the components exist. File-Access-Policy-Porvider: This provider utilizes a file on disk (authorizations.xml) to persists authorization policies. This file is loaded when NiFi starts. This means it is possible to manually generate this file and have NiFi load it on startup. Also as you mentioned, you could script out the authorization creating through NiFi Rest-API calls. Ranger provider: This moves authorization responsibility over to Apache Ranger. Policies setup within Ranger are download by the NiFi nodes where they are locally enforced. No matter which authorizer you choose to use, authorizations are easiest to manage via groups. Typical users setup ldap groups for various NiFi roles (admins, team 1, team2, etc..) and makes specific users members of these groups. This simplifies authorization since you can authorizer these groups instead of the individual users. Simply adding or removing a user as member of one of these authorized groups gives or removes authorized access to the NiFi resource identifier (NiFi policy). The ldap-user-group-provider can be added to the NiFi authorizers.xml to auto manage syncing of user and group identities from your AD/LDAP further simplifying management over the file-user-group-provider method which requires the manual adding of user and group identifiers to the NiFi. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

tono425 · ‎01-17-2025

@MattWho Thank you for explanation. Now I understand MergeRecord determines to which file each flowfile to be merged by schema information. I'll consider increasing "Minimum number of records" as you recommended. Thanks,

lexieliu · ‎01-16-2025

It seems that the next scheduled task has not been triggered

MattWho · ‎01-16-2025

@Eslam Welcome to the community. In order to get helpful answers, you'll need to provide more detail around your use case. NiFi provides many processors for connecting to various services on external hosts. You can find the list of default processors available with the Apache NiFi release here: NiFi 1.x release: https://nifi.apache.org/docs/nifi-docs/ NiFi 2.x release: https://nifi.apache.org/components/ At the very basic of level you have processors like: GetSFTP ListSFTP / FetchSFTP But there are also processor for connecting SMB, Splunk, rest-apis, SNMP, FTP, DBs, Kafka, hive, etc. on external servers. Look through the components list in the documentation for Get, List, Fetch, and Query type processors to see if any of them meet your use case needs. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

askh88 · ‎01-15-2025

Thanks @MattWho . Checking with mergeContent option with our data volume, if it works then will go with this. Thanks for help/suggestion.

Chram · ‎01-10-2025

I changed my config to this: This seems to do the trick.

MattWho · ‎01-07-2025

@ShellyIsGolden 500k+ files is a lot to list and the lookup on subsequent runs to look for new files. A few questions first: How is your listSFTP processor scheduling configured? With the Initial listing, how long does it take to output he 500K+ FlowFiles from time processor is started? When files are added to the SFTP server, are they added using a dot rename method? Is the last modified timestamp being updated on the files as they are being written to the SFTP server? So the processor when executed for the initial time will list all files regardless of the configured "Entity Tracking Time Window" set value. Subsequent executions will only list files with and last modified timestamp within the configured "Entity Tracking Time Window" set value. So accurate last modified timestamps are important. With initial listing of a new processor (or copy of existing processor) there is no step to check list files against the cache entries to see if file has never been listed before or if a listed file has changed in size since last listed. This lookup and comparison does happen on subsequent runs and can use considerable heap. Do you see any OutOf Memory (OOM) exceptions in your NiFi app logs? Depending on how often the processor executes, consider reducing the configured "Entity Tracking Time Window" value so fewer files are listed in the subsequent executions that need to be looked up. Set it to what is needed with a small buffer between each processor execution. Considering that it sounds you have yoru processor scheduled to execute every 1 minute, maybe try setting this to 30 minutes instead to see what impact it has. When you see the issue, does the processor show an active thread in the upper right corner that never seems to go away? When the issue appears, rather then copy the processor, what happens if you simply stop the processor (make sure all active threads complete, shows no active threads number in upper right corner of processor) and then just restart it? In the latest version of Apache NiFi, a "Remote Poll Batch Size" property (defaults to 5000) was added to the listSFTP processor which may help here considering the tremendous amount files being listed in your case. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

Online	Offline
Last Visited	‎12-04-2025 12:01 AM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎12-04-2025 12:01 AM
Posts	3,397
Kudos received	1615

Cloudera Community

Re: How to achieve inheritence within Parameter Co...

Re: Cannot access the NiFi Registry from NiFi and ...

Re: Error connecting to NiFi Registry from NiFi UI...

Re: using nifi as a kafka streaming- real-time str...

Re: using nifi as a kafka streaming- real-time str...

Re: Nifi cluster load balance doesn't work well

Re: Upgrade NiFi from 1.22.0 to 1.28.1

Re: How to disable anonymous on Apache Nifi 1.25

Re: Cloudera NiFi - Automatic policy creation

Re: MergeRecord generates multiple files

Re: NIFI ListSFTP 2.1.0 only run once

Re: Apache Nifi Connect to server

Re: Implement batched file processing in NiFi

Re: Flowfile stuck in Wait in EnforceOrder process...

Re: ListSFTP not listing newly added files