Member since
07-30-2019
3390
Posts
1617
Kudos Received
999
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 224 | 11-05-2025 11:01 AM | |
| 437 | 10-20-2025 06:29 AM | |
| 577 | 10-10-2025 08:03 AM | |
| 394 | 10-08-2025 10:52 AM | |
| 435 | 10-08-2025 10:36 AM |
10-21-2024
05:41 AM
@Tanya19 @MaxEcueda The PutIceberg and PutIcebergCDC processors only offer Hadoop or Hive Catalog Service provider options currently. The only mention of Glue Catalog i could find in an Apache NiFi JIra was the following still open Jira: https://issues.apache.org/jira/browse/NIFI-11449 It might be a good idea to create an Apache NiFi jira with as much detail as you can provide around this improvement request for an additional AWS Glue Catalog provider. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
10-18-2024
07:51 AM
@Kiranq This error shared: 2024-10-17 08:35:19,764 ERROR [Timer-Driven Process Thread-5] o.a.n.c.s.StandardControllerServiceNode StandardControllerServiceNode[service=CSVRecordLookupService[id=a8b84b00-b0ee-31c8-dbda-7e7e9795ba4b], name=CSVRecordLookupService, active=true] Encountering difficulty enabling. (Validation State is INVALID: ['CSV File' is invalid because CSV File is required, 'Lookup Key Column' is invalid because Lookup Key Column is required]). Will continue trying to enable. Indicates that NiFi is trying to enable a NiFi Controller services loaded from the flow.json.gz during startup, but cannot because it's configuration is invalid. It is complaining about the configuration of the "CSV File" and "Lookup Key Column" properties. Have you tried starting your NiFi with the following setting in your nifi.properties file set to "false": nifi.flowcontroller.autoResumeState=false This will start NiFi and all components on the canvas will not be started during startup. Also if you NiFi is at the point it is trying to enable components on the canvas, Your NiFi is up and running. As far as the screenshot error, have you verified ownership and permissions on that directory path. Permissions can be an issue if you started the NiFI service as different users at some point in time resulting in some files created on startup having different ownership. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
10-14-2024
02:54 PM
1 Kudo
@vg27 Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.
... View more
10-10-2024
09:54 AM
2 Kudos
@Krish98 Most NiFi Heap memory issues are directly related to dataflow design. The Apache NiFi documentation for the individual components generally does a good job with reporting "System Resource Considerations". So the first step would be to review the documentation for the components you are using to see which list "MEMORY" as system resource consideration. Example: SplitContent 1.27.0 Then sharing your configuration of those components might help with providing suggestions that may help you. - Split and Merge processor depending on how they are configured can utilize a lot of heap. - Distributed Map cache also resides in HEAP and can contribute to to significant heap usage depending on configuration and the size of what is being written to it. Beyond components: - NiFi loads the entire flow.json.gz (uncompressed it to heap memory). This includes any NiFi Templates (Deprecated in Apache NiFi 1.x and removed in newer Apache NiFi 2.x version). Templates should no longer be used. Any templates created which are listed in the NiFi templates UI should be downloaded so they are stored outside of NiFi and then deleted from NiFi to reduce heap usage. - NiFi FlowFiles - NiFi FlowFlowFiles are what transition between components via connections in your dataflow(s). A FlowFile consists of two parts. FlowFile content stored in content claims in the content_repository and FlowFile metadata/attributes held in heap memory and persisted to flowfile_repository. So if you are creating a lot of FlowFile attributes on your FlowFiles or creating very large FlowFile attributes (like extract content to an attribute), that can result in high heap usage. A connection does have a default threshold at which time a swap file is created to reduce heap usage. Swap files are created with 10,000 FlowFiles in each swap file. The first swap file would not be created until a connection on a specific node reached 20,000 at which point 10,000 would be moved to a swap file and the 10,000 highest priority would remain in heap. The default "back pressure object threshold" on a connection is 10,000 meaning that with defaults no connection would ever create a swap file. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
10-07-2024
07:33 AM
@Axmediko Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.
... View more
10-04-2024
12:20 PM
1 Kudo
@Abibee04 Please start a new Community question with the details of your query. I am not clear what you mean by "recover the registry", so the more detail you can provide the better in that new community question by sharing what step you performed and what you are looking to accomplish. Thank you, Matt
... View more
10-04-2024
09:50 AM
2 Kudos
@varungupta Tell me a bit more about your ListFile processor configuration. Is the input directory a shared mount across all your nodes or does each node have a unique set of files in the input directory? Is ListFile configured for execution of "All nodes" or "Primary node"? How many files are being listed when ListFile is executing (Is it less then 10,000)? How often is ListFile scheduled to run? Is List File traversing sub-directories for files? I assume you are extracting the sequenceNo from the filename. As far as Enforce Order goes: How are you handling the various relationships (routing via connection to where)? Are you seeing FlowFiles routed to "overlook" relationship? Matt
... View more
10-03-2024
12:16 AM
1 Kudo
@MattWho : Sorry that I missed replying to your questions early , I have made sure to fix the authorizers.xml and now I am able to access the registry UI . Thanks for all you detailed resposnes and suggestions.
... View more
10-01-2024
12:24 PM
@imvn Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.
... View more
09-27-2024
10:22 AM
1 Kudo
@Shalexey Your query does not contain a lot of detail around your use case, but i'll try to provide some pointers here. NiFi processor components have one or more defined relationships. These relationships are where the NiFi FlowFile is routed when a processor completes its execution. When you assign a processor relationship to more then 1 outbound connection, NiFi will clone the FlowFile how ever many times the relationship usage is duplicated. So looking at dataflow design shared, I see you have what appears to be "success" relationship route twice out of the UpdateAttribute processor (this means the original FlowFile is sent to one of these connections and a new cloned FlowFile is sent to the other connection). So you can't simply route both these FlowFiles back to your QueryRecord processor as each would be executed against independently. If I am understanding your use case correctly you ingest a csv file that needs to be updated with an additional new column (primary key). The value that will go into that new column is fetched from another DB via the ExecuteSQLRecord processor. Problem is that the ExecuteSQLRecord processor would overwrite your csv content. So what you need to build is a flow that get the enhancement data (primary key) and adds it to the original csv before the putDataBaseRecord processor. Others might have different solution suggestions, but here is one option that comes to mind: GetFile --> Gets the original CSV file UpdateAttribute --> sets a correlation ID (corrID = ${UUID()}) so that when FlowFile is cloned later both can be correlated to one another with this correlation ID that will be same on both. ExecuteSQL --> query max key DB QueryRecord --> trim output to just needed max key ExtractText --> Extract the max key value from the content to a FlowFile Attribute (maxKey). ModifyBytes --> set Remove all Content to true to clear content from this FlowFile (does not affect FlowFile attributes. MergeContent - min num entries = 2, Correlation Attribute name = corrID, Attribute strategy = Keep All Unique Attributes. (This will merge both FlowFiles original and clone with same value in FlowFile attribute "corrID" into one FlowFile containing only the csv content) UpdateRecord --> Used to insert the max key value from the max key FlowFile attribute into the original CSV content. (Record reader can infer schema; however, record writer will need to have a defined schema that includes the new "primary key" column. Then you will be able to add a dynamic property to insert maxkey flowfile attribute into the "primaryKey" csv column. PutDatabaseRecord --> write modified csv to destination DB. Even if this does not match up directly, maybe you will be able to apply the NiFi dataflow design concept above to solution your specific detailed use case. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more