About MattWho

MattWho · ‎01-28-2025

@Shampy Apache NiFi does NOT support the older flow.xml.gz format. It can only load the newer flow.json.gz format. Your Apache NiFi 1.27 should be producing both the flow.xml.gz and flow.json.gz flow storage formats. You'll need to use the flow.json.gz format in your NiFi 2.x installation. Apache NiFi 1.x introduced the newer flow.json.gz flow storage format in Apache NiFi 1.16 and newer. In those newer versions of Apache NiFi 1.16+ will generate the newer flow.json.gz format and still maintain the older flow.xml.gz format. This positions you for upgrading to Apache NiFi 2.x. You'll now have the flow.json.gz needed to load in your 2.x version. The proper path to Apache NiFi 2.x is to first to upgrade to the latest Apache NiFi 1.x release. Before upgrading to Apache NiFi 2.x version, you should review the release notes between your current version to the version you plan to upgrade to. This allow you to see if you are using and components that have been removed or if any breaking changes impact your dataflows: https://cwiki.apache.org/confluence/display/NIFI/Release+Notes/#ReleaseNotes-Version2.1.0 NOTE: In Apache NiFi 1.x versions that support both the flow.xml.gz and the newer flow.json.gz format, the flow.xml.gz format file will be ignored on startup if a flow.json.gz exists. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎01-27-2025

@vystar Considering the breaking changes that are part of Apache NiFi 2.0/1, there is considerably more work in preparing for an upgrade to the that new major release. So I would recommend upgrading to the latest offering in the Apache NiFi 1.x branch. You'll want to review all the release notes from 1.13 to the latest release: https://cwiki.apache.org/confluence/display/NIFI/Release+Notes/#ReleaseNotes-Version1.12.0 You'll want to pay close attention to any mentions of components being moved to optional build profiles. This means that these nars and the components they contain are no longer include with the Apache NiFi download and if needed must be downloaded form Maven Central and manually added to NiFi. Deprecated components still exist in the Download, but will not exist in NiFi 2.x releases. Make sure to maintain a copy of your flow.xml.gz/flow.json.gz (newer releases). The newer Apache NiFi 1.x load a flow.json.gz instead of the older flow.xml.gz on startup. However, in the absence of a flow.json.gz and the presence of flow.xml.gz, NiFi 1.x will load from the flow.xml.gz and produce the new flow.json.gz. After upgrade, you'll still need to review your dataflows. There are some bad practices that are now blocked by Apache NiFi that may leave some components invalid until manual action is taken to resolve the bad configuration (such as using "primary node" execution on any processor that has an inbound connection). As far as FlowFile distribution, do it early in your dataflows as possible. Utilize list/fetch still processors instead of get style. (Example: use ListSFTP and FetchSFTP in place of GetSFTP. This allows you to load-balance the 0 byte listed files before the content is fetched for the files). Other options like Remote Process Groups can be used (they come with some overhead, but do some target NiFi Cluster load based distribution when dealing with large volumes of FlowFiles. Not so great for low volumes.). Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎01-27-2025

@ose_gold Looping back on number 6. When you see the active thread indicated in the upper right corner, do you see tasks as completing or the thread just remains active all the time? Do you see any data being written to your Redis distributed map cache server? Thanks, Matt

MattWho · ‎01-27-2025

@vystar Welcome to the community. The first observation is your NiFi version being 1.12.1 released 6 years ago. There have been a lot of bug fixes and improvements made to load balanced connection since then. I strongly encourage you to upgrade to much newer version of Apache NiFi. Once a NiFi connection has load balanced the FlowFiles in the connection, it will not redistribute them again. So if your other two nodes receive their round robin distribution and have capacity to process them faster the connection will not round robin the other FlowFiles in the connection left on 1 node again. Doing so would be very expensive as each node would be trying to redistribute already round robin distributed FlowFiles over and over again. Maximizing throughput in NiFi often requires looking at all your dataflows, configurations, designs, memory and cpu usage data. Is the ExecuteStreamCommand processor the only slow point in your dataflow? What is it executing? how is it configured? Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎01-24-2025

@ose_gold Welcome to the community. Can you share more details. Try using the latest Apache NiFi 2.1.0 release instead of the very first unofficial maintenance release 2.0.0. Does the processor show as valid? From the server running NiFi, can you resolve and connect to the FTP server via command line? Do any of the files on the FTP server have a last modified timestamp newer then 3 hours old? Try changing "Entity Tracking Initial Listing Target" from "tracking time window" to "All available". Does it produce a FlowFiles? When you start listFTP processor, does it indicate an active running thread in the upper right corner? What do you see in the nifi-app.log when processor runs? Any exceptions? Try puttingg this processor class (org.apache.nifi.processors.standard.ListenFTP) in DEBUG in the NiFi logback.xml file to get additional logging output. With using "tracking timestamps" instead does it produce flowfiles? Hopefully some of the above checks can help narrow focus of where the issue exists. Thanks you, Matt

MattWho · ‎01-24-2025

@AbhiTryingAgain Welcome to the community and to Apache NiFi. Before building Dataflows it is important to understand the basics of a NiFi FlowFile. This will help you navigate the available processor components and the expectations on what they do at a high level. NiFi utilizes FlowFile so that it can remain data agnostic allowing NiFi to handle content of any type. Now performing actions against the content of a FlowFile would require processors that can understand the data format of the content. A NiFi FlowFile is what is transferred from one NiFi Processor on the canvas to the next. A FlowFile consists of two parts: FlowFile Content- This is the actual binary data/content. NiFi persist this content in content claims within the NiFi content_repository. FlowFile Attributes/Metadata - This is attributes and metadata about the content and FlowFile. At the most basic level, all FlowFiles will have timestamps, filename, etc attributes. Various NiFi processor will add even more attributes to a FlowFile. Understanding above, NiFi processors like "RouteOnAttribute" never look at the content of a FlowFile, it only looks at the FlowFile attributes of a FlowFile and route the FlowFile to the specified dynamic downstream relationship. So when you setup three routes, they all evaluated to 'true' and FlowFile was cloned and routed to all three downstream relationships. What you need is a NiFi processor that will evaluate the multiple json records in your FlowFile's content and output multiple FlowFiles based on a unique category value. For this, I think the PartitionRecord processor is whet you could use. This avoids splitting you Json record in to multiple records and then merging the various splits back into single FlowFiles based on category. You can then use the JsonTreeReader and JsonRecordSetWriters. Based on your example, configurations would look like this: Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎01-23-2025

@spartakus You will want to review the release notes for all release between 1.23. and 1.28.1 https://cwiki.apache.org/confluence/display/NIFI/Release+Notes/#ReleaseNotes-Version1.23.0 You'll notice a number of components deprecated. Deprecated components are not removed, but that are marked as deprecated and will log if used that they are deprecated. Deprecated components will eventually be removed form Apache NiFi (most removed in NiFi 2.0). You'll also make note of any items in the release notes that state: Moved <XYZ> to optional build profiles This means that these components were removed from Apache NiFi. These components can be added manually later by downloading the nars from the central maven repository and adding them to the NiFi lib directory if you need them still. https://mvnrepository.com/artifact/org.apache.nifi Make a back up of your flow.json.gz, nifi.properties, and optionally flow.xml.gz files before you upgrade. Otherwise, I see no issues with upgrading from 1.22 -- > 1.28 directly. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎01-22-2025

@amitjha Providing the full stack trace from the nifi-app.log may help here in identifying the issue. Is the controller service enabled and valid? any other exceptions in the logs related to either the consumeJMS processor or the JMSConnectionFactoryProvider controller service? The connectionFactory will on attempt to create the connection once first referencing component request a connection. The exception eludes to and issue there, but full stack trace may help identify exactly what went wrong. Have you tried this tutorial from Solace: https://solace.com/integration-hub/nifi/ Other observations: Keep in mind that you are using an HDF release that is going on 8 years old. I recommend not adding outside jars directly to the NiFi lib directory. Create a directory outside of NiFi's base directory and add your custom jars there. Just make sure the NiFi service user can access that directory and owns the jars. Thanks, Matt

MattWho · ‎01-22-2025

@anon12345 Yes, you need to unset the keystore and truststore properties because NiFi will still attempt to load them even if HTTPS properties are unset. This is by design because NiFi can act as a client and as a client it may attempt to connect to secured endpoints where NiFi would use this keystore and trustore. So ay property configured in nifi.properties will be loaded, so they need to be unset or have valid values set. For example: NiFi's Site-To-Site capability. A unsecure NiFi configured with a valid keystore and truststore can have a Remote Process Group configured to connect to a Remote port on another secured NiFi. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎01-21-2025

@jirungaray Cloudera Flow Management (Based on Apache NiFi) provides multiple methods for managing user authorization. This includes NiFi internally via the File-Access-Policy-Provider and externally via Apache Ranger. There is no built in mechanism for auto setting up authorization policies for users or groups with the exception of the Initial Admin and Initial NiFi Node authorizations. Many of the Authorization policies are directly related to the components added to the canvas. Those components are assigned unique IDs making it impossible to create policies before the components exist. File-Access-Policy-Porvider: This provider utilizes a file on disk (authorizations.xml) to persists authorization policies. This file is loaded when NiFi starts. This means it is possible to manually generate this file and have NiFi load it on startup. Also as you mentioned, you could script out the authorization creating through NiFi Rest-API calls. Ranger provider: This moves authorization responsibility over to Apache Ranger. Policies setup within Ranger are download by the NiFi nodes where they are locally enforced. No matter which authorizer you choose to use, authorizations are easiest to manage via groups. Typical users setup ldap groups for various NiFi roles (admins, team 1, team2, etc..) and makes specific users members of these groups. This simplifies authorization since you can authorizer these groups instead of the individual users. Simply adding or removing a user as member of one of these authorized groups gives or removes authorized access to the NiFi resource identifier (NiFi policy). The ldap-user-group-provider can be added to the NiFi authorizers.xml to auto manage syncing of user and group identities from your AD/LDAP further simplifying management over the file-user-group-provider method which requires the manual adding of user and group identifiers to the NiFi. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

Online	Offline
Last Visited	‎12-23-2025 08:32 PM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎12-23-2025 08:32 PM
Posts	3,406
Kudos received	1618

Cloudera Community

Re: Error importing NiFi workflow template from ve...

Re: Error importing NiFi workflow template from ve...

Re: How to elevate a default nifi user to admin - ...

Re: NiFi EnvokeHTTP - putting current date on HTTP...

Re: Invoking Nifi rest api in Data Flow

Re: Nifi 2.0-M2 Support with flow xml gz

Re: Nifi cluster load balance doesn't work well

Re: ListFTP Tracking Entities get no files

Re: Nifi cluster load balance doesn't work well

Re: ListFTP Tracking Entities get no files

Re: NIFI | nifi Extract content from json array in...

Re: Upgrade NiFi from 1.22.0 to 1.28.1

Re: NiFi ConsumeJMS processor error(Solace eventBr...

Re: Apache Nifi 2.0 Dockerized running insecure on...

Re: Cloudera NiFi - Automatic policy creation