About MattWho

MattWho · ‎09-30-2024

@Leo3103 Here are some additional challenges you may have: NiFi flow definitions do not export with any encrypted values. So it your dataflow uses a component that has sensitive properties (passwords), those will not exist in the flow.json you export. You could get the encrypted values ( they will look like enc{....<long string>....}) from the NiFi flow.json.gz file and add them in to your flow.json.raw produced by the toolkit. In order for your MiNiFi to be able to load a flow.json.raw containing sensitive encrypted values, the same sensitive.props.key value and sensitive.props.algorithm used in your NiFi (nifi.properties) must be used in your MiNiFi bootstrap.conf. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎09-30-2024

@Leo3103 Things to note. Your process group that you will be exporting a flow definition for can NOT contain local input or output ports at the top level. ---> The Process Group will become the "root" process group in your MiNiFi deployment. Only "remote" input or output ports are supported at the root level. Within the Process group you are exporting, you can have child process group deeper down that contain local input and output ports. But the test process group "internalTest" you are exporting has a local output port. If you take entire flow.json.gz from the NiFi conf directory and rename it to flow.json.raw, you can start MiNiFi with no issues. Since you are exporting a flow definition of a process group, you'll need to utilize the MiNiFi toolkit to transform it in to the proper format that can be loaded by MiNiFi. The MiNiFi-toolkit can be downloaded from here: https://nifi.apache.org/download/ (select "MINIFI' and click download link for Toolkit). Execute: ./minifi-toolkit/bin/config.sh transform-nifi <exported flow definition> flow.json.raw Now edit the flow.json.raw file and edit the following property at start of file (value can not be 0.) "maxTimerDrivenThreadCount":5 Now you can start your MiNiFi and it will create the flow.json.gz as it starts. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎09-27-2024

@Techie123 Different processor are built to execute differently when scheduled to execute by different Apache NiFi contributors. Some ways to overcome this are to possibly: Adjust the run duration so that processor runs longer when scheduled to process more FlowFiles or produce more FlowFile from an internal queue in a single scheduled execution. Configure your cron to execute multiple times within a short period of time. so maybe every second for 15 seconds within a specific hour and minute. Combination of both the above. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎09-27-2024

@Shalexey Your query does not contain a lot of detail around your use case, but i'll try to provide some pointers here. NiFi processor components have one or more defined relationships. These relationships are where the NiFi FlowFile is routed when a processor completes its execution. When you assign a processor relationship to more then 1 outbound connection, NiFi will clone the FlowFile how ever many times the relationship usage is duplicated. So looking at dataflow design shared, I see you have what appears to be "success" relationship route twice out of the UpdateAttribute processor (this means the original FlowFile is sent to one of these connections and a new cloned FlowFile is sent to the other connection). So you can't simply route both these FlowFiles back to your QueryRecord processor as each would be executed against independently. If I am understanding your use case correctly you ingest a csv file that needs to be updated with an additional new column (primary key). The value that will go into that new column is fetched from another DB via the ExecuteSQLRecord processor. Problem is that the ExecuteSQLRecord processor would overwrite your csv content. So what you need to build is a flow that get the enhancement data (primary key) and adds it to the original csv before the putDataBaseRecord processor. Others might have different solution suggestions, but here is one option that comes to mind: GetFile --> Gets the original CSV file UpdateAttribute --> sets a correlation ID (corrID = ${UUID()}) so that when FlowFile is cloned later both can be correlated to one another with this correlation ID that will be same on both. ExecuteSQL --> query max key DB QueryRecord --> trim output to just needed max key ExtractText --> Extract the max key value from the content to a FlowFile Attribute (maxKey). ModifyBytes --> set Remove all Content to true to clear content from this FlowFile (does not affect FlowFile attributes. MergeContent - min num entries = 2, Correlation Attribute name = corrID, Attribute strategy = Keep All Unique Attributes. (This will merge both FlowFiles original and clone with same value in FlowFile attribute "corrID" into one FlowFile containing only the csv content) UpdateRecord --> Used to insert the max key value from the max key FlowFile attribute into the original CSV content. (Record reader can infer schema; however, record writer will need to have a defined schema that includes the new "primary key" column. Then you will be able to add a dynamic property to insert maxkey flowfile attribute into the "primaryKey" csv column. PutDatabaseRecord --> write modified csv to destination DB. Even if this does not match up directly, maybe you will be able to apply the NiFi dataflow design concept above to solution your specific detailed use case. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎09-27-2024

@sha257 The TLS properties need to be configured if your LDAP endpoint is secured meaning it requires LDAPS or START_TLS authentication strategies. Even when secured, you will alwasy need the TLS truststore, but may or may not need a TLS keystore (depends on your LDAP setup). For unsecured LDAP url access, the TLS properties are not necessary. Even unsecured (meaning connection is not encrypted), the manager DN and manager Password are still going to be required to connect to the ldap server. Based on information shared, I cannot say what your ldap setup does or does not require. You'll need to work with your ldap administrators to understand the requirements for connecting to your ldap. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎09-27-2024

@sha257 The provider as shared is missing required configurations Manager DN, Manager password, URL, and User Search Base. Perhaps you just blanked these out for this post. Since this is an xml format file, make sure that you are properly escaping any XML special characters if used in any of the property values. XML Special Character: Replacement escape value: " " ' ' < < > > & & if any of these are used without being escaped, the xml will be invalid a not able to be loaded. I also see that you have configured the Authentication Strategy as SIMPLE which means your using ldap and not ldaps; however, I see that you have configured the TLS keystore and truststore properties. That is not an issue, unless your ldap URL is really secured requiring either the LDAPS or START_TLS "Authentication Strategy" to be set. For your User Search Filter, try changing that from "(cn={0})" to just "cn={0}" Most common issue is use of special characters within XML field property values like passwords that have not been escaped properly. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎09-26-2024

@imvn I am finding it hard to follow your use case here. Your shared dataflow image shows: 1. ExecuteSQL (query data in source database) -->(produces a FlowFile that contains 0 or more rows) --> 2. RouteOnAttribute (evaluates the "executesql.row.count"? to see if it is "0"? If so, routes the "0" row flowfiles to the "lines" relationship?. Or are you auto-terminating within RouteOnAtrtibute if row count is "0" and lines relationship is used only if "executesql.row.count" is not "0"?) 3. I see the "lines" relationship is routed twice. Once to another ExecuteSQL (deletes the data from the local database, i.e., destination) and another directly to PutDatabaseRecord (since PutDatabaseRecord has to inbound connections that will have a FlowFile, it will execute from both which is not what i think you want to happen.) Just considering the above, I think option 3 which utilizes the "FlowFile concurrency" and "outbound policy" settings on a process group would handle you timing needs. Where your RouteOnAttribute goes in place of the ExtractText processor and you feed lines into the child Process Group. The question is what is the overall goal of this use case? Are you trying to maintain an up-to-date copy of the source database in the destination database? or are you trying to just copy rows added to source DB to the destination DB? If so there are better dataflow designs for that. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎09-26-2024

@Vikas-Nifi Your dataflow is working as designed. You have your listFile producing three FlowFiles (1 for each file listed). Each of those FlowFiles the trigger the execution of your FetchFile which you have configured to fetch the content of only one those files. If you only want to fetch "test_1.txt", you need to either configure the listFile to only list file "test_1.txt" or you need to add a RouteOnAttribute processor between your listFile and FetchFile so that you are only routing the listed FlowFile with ${filename:equals{'test_1.txt')} to the FetchFile and auto-terminating the other listed files. The first option of only listing the file you want to fetch the content for is the better option unless there is more to your use case then you have shared. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎09-25-2024

@imvn A NiFi FlowFile consists of two parts: FlowFile content - The content of a FlowFile is written to a content claim within the NiFI content repository. A content claim is immutable. FlowFile Attributes/metadata - Attributes/metadata are written to the flowfile repository and persists in NiFi heap memory unless forced to swap due connection thresholds. This metadata includes information about where to find the content among other things. How NiFi processor handle inbound and outbound FlowFiles is largely processor specific. Fo processors that write output to the content of a FlowFiles, this may be handle two different ways depending on processor. Some processor might have an "original" relationship where the original FlowFile referencing original inbound content claim gets routed while creating a new FlowFile with same attributes pointing to new content output in a different claim and routing to some other relationship like "success". Other processor might not have an "original" relationship and instead decrement a claimant count on the original content claim and update the existing FlowFile metadata to point to the content created in the new content claim. The ExecuteSQL processor follows the latter process. So you have a dataflow built like this if i understand correctly: ExecuteSQL (writes content to FlowFile) --> some processor/processors (extract bits from content to use for Delete) --> ExecuteSQL (performs delete but response is written as new content for the FlowFile) --> PutDatabaseRecord (has issues since original needed FlowFile content is no longer associated with the FlowFile). Option 1: Can you re-order yoru processors so you have ExecuteSQL. --> PutDatabaseRecord --> Extract content --> ExecuteSQL (Delete) This makes sure orginal content is persisted long enough to compete write to target DB. Option 2: ExecuteSQL --> ExtractContent --> Route "success" relationship twice (once to ExecuteSQL to perform delete ad second to PutDatabaseRecord to write to DB). Similar to below example: You'll notice that the "matched" relationship has been routed twice. When the same relationship is routed twice, NiFi clones the FlowFile (original goes to one connection and clone goes to the other. Both FlowFiles reference the same content claim (which remember is immutable). When ExecuteSQL (delete) the executes on one of them it does not impact the content in the other one that is going to PutDatabaseRecord. If I am not clear on your use case, let me know. I was a bit confused on the "delete data from destination" part. Destination = the putDatabaseRecord configured DB destination? not clear why you would be deleting something there that has not yet been written. So if there is a dependency that the ExecuteSQL (Delete) completes before the PutDatabaseRecord executes, there is a third option that utilizes the "FlowFile concurrency" and "outbound policy" settings on a Process Group. The dataflow would look something like this: Inside the Process Group configured with "FlowFile concurrency = Single FlowFile Per Node" and "outbound policy = Batch Output" set, you would have this flow: So you Dataflow only allows 1 FlowFile to enter the PG at a time. Within the PG, the FlowFile is cloned with one FlowFile routing to the output port and the other to the ExecuteSQL (delete). the FlowFile queued to exit PG will not be allowed to exit PG until the FlowFile being processed by the ExecuteSQL (delete) is auto-terminated or routed to some other output port. This makes sure that the PutDatabaseRecord processor does not process the FlowFile with the original content claim until your ExecuteSQL (delete) was executed. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎09-25-2024

@Twelve @aLiang The crypto.randomUUID() issue when running NiFi over HTTP or on localhost has been resolved via https://issues.apache.org/jira/browse/NIFI-13680. The fix will be part of next release after NiFi-2.0.0-M4. Thanks, Matt

Online	Offline
Last Visited	‎01-16-2026 04:51 PM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎01-16-2026 04:51 PM
Posts	3,421
Kudos received	1624

Cloudera Community

Re: Best Practice for configuring registry flows

Re: Nifi 2.7.2 Start Problem

Re: Error importing NiFi workflow template from ve...

Re: Error importing NiFi workflow template from ve...

Re: How to elevate a default nifi user to admin - ...

Re: Unable to start MiNiFi because of an error ca...

Re: Unable to start MiNiFi because of an error ca...

Re: ConsumeImap Processor is not working properly ...

Re: How to use query result in another query Apach...

Re: Apache Nifi Registry - Unbale to set LDAP - Un...

Re: Apache Nifi Registry - Unbale to set LDAP - Un...

Re: Apache nifi simple flow

Re: NiFi ListFile/ListSFTP + FetchFile/FetchSFTP i...

Re: Apache nifi simple flow

Re: Issue with NiFi 2.0.0-M4: crypto.randomUUID is...