About MattWho

MattWho · ‎08-14-2023

@Tenda What processor become unresponsive? You mean that the processor indicates that it is currently executing a thread (small number shown in upper right corner); however, all the stats on the processor for in, out, tasks show 0 for last 5 minutes? If tasks show numbers updating, then tasks/threads are executing and completing. If tasks is showing 0 for last 5 minutes or very low for last 5 minutes and you see an active thread number in upper right corner of processor, it may be caused by a few reasons: Your CPU load average is how due to cpu intensive processors executing at same time. (would expect lag in UI if CPU was saturated) You have a processors configured with too many concurrent tasks leading to other processors not getting allocated a thread often enough. (If core load average is consistently low, you could increase the size of your max timer driven thread pool higher than 32. Java heap garbage collection (GC). GC happens when your JVM heap usage reaches ~80% utilization. If your heap is too small, you could be experiencing lots of back to back GC. All GC whether partial or full GC are stop-the-world events weaning JVM will do nothing while GC is happening. If you heap is set to large, the GC stop-the-world may take much longer to complete. You have processors that have long running tasks or hung threads consuming threads from your available max timer driven thread pool thus limiting available threads for other components. Only the examination of a series of multiple NiFi JVM thread dumps collected minutes apart will tell you if you have a long running task (thread dump shows change to thread indicating slow progress being made) or potentially hung thread (thread dumps all show same consistent output for thread. When you have a processor that is in this state and "terminate" the thread on the processor, does the terminated thread (shown as small number with parenthesis "(1)") ever go away? if not, that terminated thread never completed. While "terminate" release FlowFile associated to that thread back to inbound connection queue and give user back full control of the processor. The only way to "kill" a truly hung thread is by restarting the entire NiFi JVM. Which you said you do once in awhile. Hope you find this information helps you drill deeper in to your issue and identify what is impacting you. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

MattWho · ‎08-11-2023

@tej_s I recommend against switching to encrypted repositories while you still have content in your flowfile_repository, content_repository, and provenance_repository. What version of Apache NiFi are you using? If you stop NiFi and delete the contents of these encrypted repositories and startup clean, do you encounter and issues like above with all new data being processed through your dataflows? If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

MattWho · ‎08-11-2023

@Madhav_VD Apache NiFi contains no native processors that utilize Apache Tika other than IdentifyMimeType (this processor does not do any extraction), but you can find others in the Apache that have created custom processors that utilize Apache Tika. Adding custom nars to Apache NiFi is as easy as adding the custom nar to the auto-load directory: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#autoloading-processors While I have no experience with any of these custom nars, you can give them a try to see if they meet your needs. If not they may provide you with a stepping stone for creating your own custom variant. https://github.com/tspannhw/nifi-extracttext-processor/releases/tag/html https://community.cloudera.com/t5/Community-Articles/ExtractText-NiFi-Custom-Processor-Powered-by-Apache-Tika/ta-p/249392 https://community.cloudera.com/t5/Community-Articles/Creating-HTML-from-PDF-Excel-and-Word-Documents-using-Apache/ta-p/247968 https://github.com/tspannhw/nifi-extracttext-processor If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

MattWho · ‎08-10-2023

@ravi_tadepally The NiFi CLI toolkit currently only supports authentication with client certificate, client certificate with proxied user identity or basic auth (via basic auth token). There is no option to obtain a token via OIDC authentication method. Using token based authentication to perform other NiFi Toolkit CLI commands is probably not the best approach. Tokens have limited life, are only valid fro use interacting with the specific NiFi instance from which it was issued. The better approach would be to create a cli-nifi.properties file with a proxied entity (this would be your OIDC user identity): baseUrl=https://<nifi-hostname>:<NiFi-port> keystore=/path/to/<nifi-keystore.jks> keystoreType=JKS keystorePasswd=<nifi-keystore-password> keyPasswd=<nifi-key-password> truststore=/path/to/truststore.jks truststoreType=JKS truststorePasswd=<nifi-truststore-password> proxiedEntity=<OIDC username> In a NIfI cluster, the NiFi keystore certificate should already be authorized to "proxy user requests". The "-p cli-nifi.properties" option in NiFi Cli toolkit will utilize the config file above to authenticate via the NiFi node certificate and then make authorized request on behalf of the proxied entity. So, no need to directly authenticate and obtain a token for that proxied entity. example: ./cli.sh nifi cluster-summary -p cli-nifi.properties would return following provided the proxied entity is authorized for that endpoint data: Total node count: 3 Connected node count: 3 Clustered: true Connected to cluster: true If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

MattWho · ‎08-10-2023

@Fredi NiFi's remote input ports are designed to receive NiFi FlowFiles transmitted to the remote-input port by a NiFi Remote Process Group (RPG) using NiFi's Site-to-Site (S2S) protocol. They are not intended to be used for pushing arbitrary files into a NiFi dataflow. For moving files in to NiFi, you should be using a NiFi processor for that. There a variety of different of different processors available to include but not limited to ListFile+FetchFile, ListSMB+fetchSMB, listSFTP+FetchSFTP, GetFile, HandleHTTPRequest+HandleHTTPResponse, ListenHTTP, etc. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

MattWho · ‎08-10-2023

@mslnrd Don't confuse authentication with authorization. The only PrivateKeyEntry that can be in the NiFi-Registry keystore is the private key for the NiFi-Registry host on which NiFi-Registry is installed. The only PrivateKeyEntry that can be in the NiFi keystore is the private key for the NiFi host on which NiFi- is installed. These PrivateKeyEntries need to have EKUs for both clientAuth and ServerAuth. The truststore can have 1 too many TrustedCertEntries. Typically you would maintain singled truststore you use everywhere that contains all the intermediate and root Certificate Authorities (CAs) used to sign/issue the privateKey. A truststore would not contain any PrivateKeyEntries. It is intended for only public certs. When NiFi initiates a connection to the configured NiFi-Registry client, a TLS handshake occurs, within that handshake the NIFi-registry (server side of handshake) will want NiFi to identify itself via a certificate that NiFi-Registry is capable of trusting. The DistinguishedName (DN) of that clientAuth NiFi certificate is then used as the authenticated user identity. Also included in that exchange is the user identity for the user who is authenticated in NiFi initiating this client connection (via action of starting version control, stopping version control, initiating a version change, etc). At that point authentication is done and authorization must then happen. NiFi-Registry will check to see that the NiFi clientAuth DN is authorized to proxy the NIFi user's request and verify that the NiFi user identity is authorized to read and/or write to the bucket(s) in NiFi-registry. These authorizations are setup from within the NiFi-registry UI. The NiFi-Registry Admin guide covers the NiFi-Registry Access Policies and what those policies allow the authenticated user to do. The NiFi-Registry User Guide covers how to manage users via the NiFi-Registry UI. Since NiFi-Registry (just like NiFi) has identity mapping properties in the nifi-registry.properties configuration file to manipulate the user identity strings post successful authentication, the full DN may not be passed to the configured authorizer that handles the authorization phase. It is common to have mapping s setup to trim only the CN out of the full DN of a certificate. Keep in mind that the user identity string passed to the authorizer is case sensitive and must match exactly. The nifi-registry-app.log should be logging requests made by an authenticated user. If you are seeing "anonymous" user in that log, then authentication was nit successful. NiFi-Registry allows "anonymous" users to read from public buckets only. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

MattWho · ‎08-10-2023

@Anderosn In-between your SplitJson and PuSQL processors are you rebalancing the FlowFile across multiple nodes in a NiFi cluster? Are you routing any of the split Json messges down a different dataflow path that does not lead to this pusSQL processor? The reason I ask is because the splitJson processor will write the following FlowFile attributes to each new FlowFile created (each split): The fragment.identifier value and fragment.count are used by the putSQL processor when "Support FragmentTransactions" is set to "true" (default). This means that, if not all split jsons are present at this putSQL and located on the same node of the NiFi cluster, the FlowFiles part of the same fragment.identifier will not be processed and remain on the inbound connection to the PutSQL. I'd start my listing the connection and checking these attributes to verify the fragment.count is "10", the fragment.identifier has same value on all 10, and fragment.index value shows numbers 1 to 10 across those 10 FlowFiles. If making sure all fragments are processed in same transaction is not a requirement for your dataflow, try changing "Support Fragmented Transactions" to false and see if these 10 FlowFiles get successfully executed by your putSQL processor. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

MattWho · ‎08-08-2023

@kellerj The alternative is making rest-api calls or navigating to every processor that writes state and purge it manually. You would need to keep track of all your current and newly added processors that store state in order to accomplish, but shutting down your NiFi is not needed to clear state this way. 🙂 As far as finding all processors that store state, that is challenge in itself. The embedded documentation for every processor will have a "State Management:" section which will tell you if the component stores state (Note that processors that show only "Cluster" state will store that state locally if it is a non cluster configured standalone instance of NiFi"). Once you have identified all the components your operations team is using by their unique UUID and filtered only those that write state, you can use that info to clear state using NiFi's rest-api, which is a multi-request process. Then you need to worry about your operations team adding additional state based components later or removing and re-adding an existing state based processor triggering a new UUID. So while possible, it is challenging to orchestrate and maintain. You can also right click on a component to view its state and then purge that state directly from the listed state within NiFi UI. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

MattWho · ‎08-08-2023

@JohnSilver Your are not using the processor correctly resulting in your issue. The "list<type>" processors are designed to optimize distributed processing of files from sources that may not be NiFi cluster friendly. They are designed to simply list the contents of the target input directory and produce a single 0 byte FlowFile with metadata about the content. Example of what FlowFile Attributes are created by the ListSMB processor: The list processors are intended to be configured to execute on "primary node" only in their configuration. This prevents all nodes in a NiFi cluster from listing the same files. These 0 Byte FlowFiles can the be distributed/load-balanced across all the other nodes in the cluster using the load balanced configuration capability on a NiFi connection. Finally these now distributed 0 byte FlowFiles are sent to the Fetch<type> processor that should be configured to use the metadata/attributes on the FlowFile to retrieve the content for each listed FlowFile and insert in to the FlowFile content (after Fetch<type> processor, FlowFile size will no longer be 0 bytes). Where you have misconfigurations: 1. You configured listSMB with "no tracking" which means it will retain no state on what was previously listed. Without state, every execution of the listSMB will result in listing the same source files over and over again. 2. Each listed file becomes its own 0 byte FlowFile with a bunch of added attributes. You are then passing that to your FetchSMB processor. The default configuration of the "Remote File" processor property is "${path}/${filename}" which would take the value from the FlowFile attributes "path" and "filename" to fetch the content for the FlowFile. You have instead misconfigured this property to always fetch the same content no matter which FlowFile is being processed. So you are inserting the same content in to every one of your unique FlowFiles (each listed FlowFile has a filename that is persisted). That is why you see same weight/size for all your fetched FlowFiles. You might try configuring your listSMB processor Input Directory to point at the full path to the single file you want to list. I have not tried this as the intended usage is listing everything from within a target directory. If that does not work, you could use a routeOnAttribute processor to route only the FlowFile with the specific filename you are looking for to FetchSMB and terminate unmatched. Also keep in mind the your ListSMB is by default going to re-list the same File(s) over and over because you have it configured with "no tracking" and default run schedule is "0 sec" (which means schedule to execute as fast/often as possible) Your PutHDFS appears to be working as expected, your issues seems purely in your upstream configurations. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

MattWho · ‎08-08-2023

@kellerj Depending on your setup, state may be stored locally, in Zookeeper, or a mix of both. With a standalone non NiFi cluster setup, all state will be store in the local NiFi state directory. With a NiFi cluster setup (even if cluster only consists of one node), some processors will store state in a mix of local via file (state specific to a single node) and cluster via zookeeper (state that needs to be shared amongst all nodes in a NiFi cluster). The NiFi configuration file "state-management.xml" defines where both the local state and cluster state is being stored. You can clear all local state by simply emptying the contents of the locally configured state directory on every node. The "Directory" is defined in the "local-provider" within the state-management.xml file You can clear the Cluster state, by clearing out the znode on zookeeper defined in the "cluster-provider" within the state-management.xml file. NiFi must be shutdown before performing clearing out either local or cluster state. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

Online	Offline
Last Visited	‎11-18-2025 07:56 AM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎11-18-2025 07:56 AM
Posts	3,406
Kudos received	1619

Cloudera Community

Re: Error importing NiFi workflow template from ve...

Re: Error importing NiFi workflow template from ve...

Re: How to elevate a default nifi user to admin - ...

Re: NiFi EnvokeHTTP - putting current date on HTTP...

Re: Invoking Nifi rest api in Data Flow

Re: Nifi was suspended for a few minutes, then res...

Re: Error in Base64EncodeContent processor after c...

Re: How to use Apache Tika in NIFI to extract met...

Re: NIFI Toolkit not working with OIDC configurati...

Re: Data transfer via nifi api

Re: NiFi Registry HTTPS Setup Giving SSL_ERROR_BAD...

Re: PutSQL - Not enough FlowFiles for transaction...

Re: Find Processors with Stored State

Re: (Apache NiFi) - ListSMB + FetchSMB don't work ...

Re: Find Processors with Stored State