About MattWho

MattWho · ‎03-06-2025

@vg27 When it comes to NiFi's content, FlowFile, and provenance repositories, it is about performance. FlowFile repository contain the attributes/metadata for a FlowFile. This includes what content claim in the content repository and byte offset contains the content for a FlowFile. The contents of this repository typically remains relatively small. Usage is in direct correlation with number of FlowFiles actively queued in the NiFi UI and the size of the attributes on the FlowFile. So size can quickly grow if you build dataflows the extract content from the FlowFiles into FlowFile attributes. FlowFile attributes are read/written by every processor that touches the FlowFile. Content repository contain the content claims referenced by the FlowFiles. Each content claim can hold the content for 1 too many FlowFiles. A content claim is only moved to archive and then eligible for deletion ONLY once no FlowFiles reference any content in the claim. So a one byte FlowFile left queued in some connection on the NiFi UI can prevent a large content claim from being deleted. Content is only read by processor that need to read that content (some processor only need access to the FlowFiles metadata.attributes). Provenance repository hold events about the life of a FlowFiles through your NiFi dataflows from create to delete. NiFi can produce a lot of provenance events depending on FlowFile volume and number of NiFi processor components a FlowFile passes through. Since provenance events are not a required part of processor your FlowFiles, you have complete control over retention setting and how much disk space they can consume. Loss of this repo, does not result in any dataloss. Since all three of these repos have constant I/O using NFS storage or standard HDD would not be my first recommendation. (NFS storage relies on network I/O and Standard HDD probably are going to create a performance bottleneck for your data volumes). I am not that familiar with the performance characteristics of Azure blob storage to make a recommendation there. SSD are good choice, but make sure there is data protection for your content and FlowFile repositories. You don't want disk failure to result in data loss. I am not clear on this "Data expected to grow 3TB to 5TB." Is that per hour, per day, etc... Is it spread evenly over the day or comes at specific heavy times each day. Take this into consideration when selecting based on storage throughput performance. Please help our community grow and thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎03-06-2025

@pavanshettyg5 What version of Apache NiFi are you using? The NiFi screenshot you shared implies authentication was successful, but you are having some form of authorization issue. The second screenshot you shared from the logs is not providing much useful information. What is observed in both the nifi-user.log and nifi-app.log when you attempt to access the NiFi UI? You mention that you are using "OIDC provider". So when you access NiFi are you getting to the login prompt where you provide your OIDC credentials? What is seen in the logs at this time and when you submit your credentials? Does your NiFi truststore contain the complete trust chain (all root and intermediate public certs used to sign the server certificate) for your OIDC endpoint? Please help our community grow and thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎03-06-2025

@NaveenSagar Welcome to the community. There is not enough information provided to investigate your issue. What version of Apache NiFi are you using? What processor(s) is producing the timestamps in question? What is the configuration of the processor(s)? Thank you, Matt

MattWho · ‎03-04-2025

@AllIsWell Welcome to the community. NiFi templates where deprecated long ago in the Apache NiFi 1.x version as well. They were officially completely removed from the product in Apache NiFi 2.x. Flow Definitions which are in json format can be downloaded and uploaded to both newer version of Apache NiFi 1.x and all versions of Apache NiFi 2.x. The rest-api docs cover upload of a flow defintion here: https://nifi.apache.org/nifi-docs/rest-api.html#uploadProcessGroup There are 6 form fields: An example rest-api call would look something like this: curl 'https://<nifi-hostname>:<nifi-port>/nifi-api/process-groups/<uuid of Process group in which flow definition will be uploaded>/process-groups/upload' \ -H 'accept: application/json, text/plain, */*' \ -H 'content-type: multipart/form-data' \ -H 'Authorization: Bearer <TOKEN>' \ --form 'clientId="<uuid of Process group in which flow json will be uploaded>"' \ --form 'disconnectedNodeAcknowledged="false"' \ --form 'file=@"/<path to>/<flow-defintion.json filename>"' \ --form 'groupName="handleHTTPRequestFlow2c"' \ --form 'positionX="361"' \ --form 'positionY="229.5"' \ --insecure I use ldap to authenticate in to my NiFi, so i use the bearer token issues for my user on authentication in the above rest-api call. Please help our community grow and thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎03-03-2025

@ajignacio The PutMarkLogic processor is not a component bundled and shipped with Apache NiFi, so I am not familiar with it. You may want to raise your issue directly with those who developed this connector and include your Apache NiFi version specifics as well: https://github.com/marklogic/nifi/issues Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎03-03-2025

@Emery Your first query: One thing I've noticed is that it's not possible to select the MapCacheClientService I created. Components added to the NiFi canvas only have access to the controller services created on Process Groups (PG) on the canvas. Even when you are looking at the canvas presented when you first login, you are looking at the root PG. From the root PG you can create many child PGs. When you use the Global Menu in the UI to access the "Controller Settings", you have the ability to create "Management Controller services". Controller services created here are for use by Reporting tasks and Registry Clients created from with the same "Controller Settings" UI. They are not directly going to be referenced by the components on the dataflow canvas. This is why the MapCacheClientService you created was not see by your PutMapCache and FetchMapCache processors. From within the processor component, you have the option to select existing supported controller service that exist already within the current PG level or any parent level PG (assuming user has proper permissions at those parent levels). It is important to understand the Child PGs inherit policies from parent PGs unless an explicit policy is defined on the child PG. You also have the option to "Create New Service", which you can select even if an available controller service already exists.. If a supported controller service exists, it will be presented in a selectable list when you click on the processor field, so it is NOT necessary to create a separate controller service for each processor. To create a new service you must click on the three stacked dots instead of clicking on the field. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎03-03-2025

@jirungaray The DistributedMapCacheServer controller service sets up a cache server which will keep all cached objects in NiFi's JVM heap memory. This cache is lost if the controller service is disabled/re-enabled or if NiFi were to restart unless the "Persistence Directory" is configured. The persistence directory is some local disk directory where cache entries are persisted in addition to those cache entries also being in Heap memory. The persistence to disk allows the in memory cache to be reloaded if the cache server is disabled/re-enabled or NiFi is restarted. I assume this is the cache server you are currently using. Matt

MattWho · ‎03-03-2025

@Bern Unfortunately there is not enough information here to understand exactly what is going on. The only exception shared was related to an attempt to terminate a thread on some processor. As far as why you see this, there is not enough information to say. It could be a bug in an older version, could be load issue, could be thread pool exhaustion, etc. Observations and questions: You are running with a very old version of Apache NiFi release 6+ years ago and one of the first releases to offer the Load-Balanced connections feature which was very buggy when it first was introduced. You would greatly benefit from upgrading for security fix and bug fixes reason. You see to be using the load-balanced connections excessively. It makes sense to redistribute NiFi FlowFiles in connections after your executeSQL processors, but i see no value in redistributing after RouteOnAttribute or on the failure connections. This just adds excessive and unnecessary network traffic load. I see you have ~1400 running components and a queue of ~265,000 FlowFiles. What is the CPU load average on each of yoru nodes and how many nodes do you have in your NiFi cluster? What java version is being used? Are Garbage Collection (GC) stats healthy. How often is GC (partial and full) running? How long is spent on GC? Any other ERROR in your nifi-app.log? Have you taken any thread dumps when you are having issues with processor component threads? What did you observe there? Please help our community grow and thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎02-28-2025

@shiva239 The Schemas are fetched when a FlowFile is processed by the PutDatabaseRecord processor. There is no option to schedule a refresh of the existing cache. There is no rest-api endpoint or options available to flush the cache on demand aside from maybe stopping and starting the processor. But doing so means every schema will be cached again as new FlowFiles are processed by the PutDatabaseRecord processor, so not an ideal solution/work-around. The issue you are having is related with an existing Apache NiFi NIFI-12027 PutDatabaseRecord improvement jira. I suggest you add a comment to this jira explaining your use case and impact this has. Perhaps someone in the community or yourself can contribute to this improvement. Please help our community grow and thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎02-28-2025

@shiva239 The PutDatabaseRecord processor has a Table Schema Cache Size property that Specifies how many Table Schemas should be cached. This cache is used to improve performance. you could try setting this to 0 from its default 100, but am not sure how this will impact your specific overall performance. Please help our community grow and thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

Member Since	‎07-30-2019 10:41 AM
Last Visited
Posts	3,222
Kudos received	1585

Cloudera Community

Re: The InvokeHttp Processor's 'Retry' option is c...

Re: Get clientId value

Re: MergeContent processor is erroring with Defrag...

Re: Investigation of Future Timestamps Generated i...

Re: Nifi storage setup in Azure

Re: Nifi storage setup in Azure

Re: Securing Nifi with SSL and using OIDC provider...

Re: Investigation of Future Timestamps Generated i...

Re: Import Json Definition to a target nifi system...

Re: Try to connect nifi to PutMarkLogic processor

Re: Using MapCacheServer and MapCacheClientService...

Re: Age Off Duration

Re: GenerateFlowFile "Failed to properly initiali...

Re: Nifi DatabaseTableSchemaRegistry - PutDatabase...

Re: Nifi DatabaseTableSchemaRegistry - PutDatabase...