About MattWho

MattWho · ‎07-08-2022

@Luwi An "active content claim" would be any content claim where a FlowFile exist still referencing bytes of content in that claim. A NiFi content claim file can contain the content for 1 too many FlowFiles. So all it takes is one small FlowFile still queued in some connection anywhere on your NiFi canvas to prevent a content claim from being eligible to be moved to archive. This is why the total reported content queued on yoru canvas will never match the disk usage in your content_repository. This article is useful in understanding this process more: https://community.cloudera.com/t5/Community-Articles/Understanding-how-NiFi-s-Content-Repository-Archiving-works/ta-p/249418 Thank you, Matt

MattWho · ‎07-08-2022

@mbraunerde Assuming you do not want to lose the original content of all these files, so you have numerous challenges here. 1. You don't have a known number of files. So when collecting a single list of all Filenames, how do you know that all have been received in the queue? Since NiFi is designed as a data in motion service. 2. Preserving the original FlowFiles content. Sounds like you are maybe trying to produce a new FlowFile with content containing just the filenames of all the files received by your NiFi dataflow, but still allow original FlowFiles along with the original content still get processed separately? Overcoming these challenges a depends on some unknowns at this point. 1. How is this data ingested to your NiFi? Is it a constant stream of data? Is it burst of data once a day? If you can control the ingest and there is a known gap between stream of data, you maybe over to overcome challenge 1 above. 2. Overcoming challenge 2 can be done via cloning of the FlowFiles. Every NiFi processor has outbound relationships that can be added to NiFi connections or auto-terminated within a processor's configuration. So some point in your flow you would simply add the "success" relationship of a processor to two different connections (essentially one connection will have original FlowFile and other would have the clone. Down one dataflow path you continue to handle the FlowFiles with their original content. The other dataflow path you can use a ReplaceText processor to do a literal replace replacement strategy and set the Replacement Value to ${filename}. What this will do is replace all the content of that FlowFile with just the filename of that FlowFile. Then as @SAMSAL suggested use a MergeContent processor to merge all your FlowFiles so you have one new FlowFile containing all the Filenames. Since you are dealing with an unknown number of files, you could configure the MergeContent with an arbitrarily large Minimum Number of Entries (some value larger than you would expect to receive in a single batch. You would also need to set Maximum Number of Entries to a value equal to or larger then the min. This will cause FlowFile to continue to get added to a bin for merge without actually being merged. Then you set Max Bin Age to a value high enough that all batch FlowFile would have been processed. Max Bin Age serve as a method to force a bin to merge even if min values have not been reached after a configured amount of time. So you are building in a delay in this flow to allow for the data in motion nature of NiFi. Finally sent that merged FlowFile to your putEmail processor. Or maybe we are not understanding the use case completely. Are you looking for what is actually in a given queue and positional order? Keep in mind that NiFi is a data in motion service meaning that it should not be used to hold data in queues. Which in turn means that the queued FlowFiles in a connection are typically constantly changing. But if this is what you are looking for, you could use the invokeHTTP processor to obtain the listing of FlowFiles in a queue. This would require a series of rest-api calls. First invokeHTTP would make a request via a POST get generate a queue listing result set for a connection from all nodes. The response to that post would be the url of the result set which you would use in a second invokeHTTP to GET that result set. Finally you would need a third invokeHTTP to DELETE the result set so it is not left hanging around in NiFi heap memory. Even then you have a large json which contains a lot more than just position, filename, and NiFi cluster host names. So you would use additional processor to parse the desired information from that json record. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎07-07-2022

@sayak17 If you are simply looking to GET from a REST API endpoint and take the response and write to a local file on the server where your NiFi service is running, you'll want to use the InvokeHTTP processor and feed that to your putFile processor. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎07-07-2022

@templarian Does the target URL hostname go to a LB that can route that request to any number of endpoint servers? Perhaps not all those endpoint servers use the same serverAuth certificate or are not all signed by the same authority or an authority known the the truststore configured in the SSLContextService you have configured in your invokeHTTP processor. In this scenario, it would work when your request get send to some endpoints and fail with others even though what you have configured in the processor never changes. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎07-07-2022

@Luwi The log output you shared implies not much in archive (2 to 6 archived claims each time). So it appears that majority of your disk usage is being consumed by either active claims, other services or files on your system, etc... - Are you processing large files or a mix of large and small files? - Are you leaving FlowFiles in connection queues sitting for long periods of time? - Is disk used for your content_repository used for other things besides NiFi? Bottom line is that even adjusting the "nifi.content.repository.archive.backpressure.percentage" to a higher percentage just pushes the issue further down the road. You'll hit it again if disk continues to fill with non archived content from NiFi or something external to NiFi. NiFi best practices strongly encourage a dedicated disk for the content_repository(s), and flowfile_repository. Provenance_repository and database_repository may share a disk since you provenance_repository usage can be controlled and database_repository remains relatively small. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎07-07-2022

@Luwi Yes, you made add it manually to the nifi.properties file in 1.16.3. NiFi will not read that new property until you restart the service. Did you upgrade from a previous NiFi release? Not sure why it is missing. Matt

MattWho · ‎07-07-2022

@Meepoljd You'll want to have https enabled to prevent access to NiFi's endpoints directly. When NiFi is not secured (HTTPS), it does not require user authentication or authorization. Thus access is treated as anonymous. When using Apache Knox, NIFi can not be configured with other login based authentication like a login-provider in the login-identity-providers.xml or OpenID or SAML via associated properties in the nifi.properties file. So make sure these properties are not configured in the nifi.properties file when you have also configured the knox properties: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#saml https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#openid_connect and following login provider property: nifi.security.user.login.identity.provider= If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎07-07-2022

@Reddy3 There is not enough history or detail to understand what lead to the corruption of your NiFi HD DBs. However, the NiFi H2 DBs are used to hold flow configuration history (what user did what on the canvas) and authenticated user server side tokens. These DBs can be removed/renamed and on startup NiFi will created new DBs. Removing these will have no impact on your flow.xml, content, etc... You will simply lose your flow configuration history (will start over from this point forward tracking changes) and your users will be forced to login again. Hope this helps, Matt

MattWho · ‎07-05-2022

@pk87 Also consider that using timer driven may not always give you 60 minute scheduling. When using timer driven the component will get scheduled upon start and then again x configured amount of time later. A NiFi restart or stopping and starting the processor will reset this. If you need to make sure that a component is only scheduled every x amount of time consistently, you should be using cron driven scheduling strategy which will allow you to set specific time of schedule. Thanks, Matt

MattWho · ‎07-05-2022

@pk87 Above was discussed in another post here: https://community.cloudera.com/t5/Support-Questions/NIFI-I-m-getting-issue-like-quot-authentication-details-were/td-p/346593 A configuration change was provided to resolve the http 400 response. The new response was an HTTP 401 so request was good, but a end point did not like the authorization provided in the request. Thanks Matt

Online	Offline
Last Visited	‎11-18-2025 07:56 AM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎11-18-2025 07:56 AM
Posts	3,406
Kudos received	1619

Cloudera Community

Re: Error importing NiFi workflow template from ve...

Re: Error importing NiFi workflow template from ve...

Re: How to elevate a default nifi user to admin - ...

Re: NiFi EnvokeHTTP - putting current date on HTTP...

Re: Invoking Nifi rest api in Data Flow

Re: Unable to write flowfile content to content re...

Re: Merge Queue Attributs, but not the content

Re: How to use the RestLookupService to make REST ...

Re: SSLPeerUnverifiedException on InvokeHTTP proce...

Re: Unable to write flowfile content to content re...

Re: Unable to write flowfile content to content re...

Re: Whether SSO can be used without enabling HTTPS...

Re: Error creating bean with name 'org.springframe...

Re: Regarding NIFI Timer driven scheduling not wo...

Re: Regarding NIFI Timer driven scheduling not wo...