About MattWho

MattWho · ‎01-02-2024

@Alexy The answer to your question is neither... The following property is relative to the configuration of the "fileNamingPattern" property. <maxHistory>60</maxHistory> Let's say we have the following fileNamingPattern property: <fileNamePattern>logs/archived/app.%d{yyyy-MM-dd}.log.gz</fileNamePattern> This pattern would create one log file per day and then maxHistory would would dictate number of days to retain as 60. In this specific example the number of days retained and number of files and number of days both equal 60. and another example: <fileNamePattern>${org.apache.nifi.bootstrap.config.log.dir}/nifi-app_%d{yyyy-MM-dd_HH}.log</fileNamePattern> Here you will notice the date pattern is based on hourly logs. So the "maxHistory" property in this example would dictate the number of hours worth of logs to retain (not days). Now lets look at a slightly different example: <fileNamePattern>logs/archived/app.%d{yyyy-MM-dd}.%i.log.gz</fileNamePattern> <maxFileSize>10MB</maxFileSize> In this example you'll notice "%i" has been added to pattern. "%i" works in conjunction with the "maxFileSize" property. throughout a single day any time the log file sizes reaches 10MB, an incremental number log for the same day is created. app.2024-01-02.1.log.gz app.2024-01-02.2.log.gz app.2024-01-02.3.log.gz ... The MaxHistory does not take into consideration the "%i" incremental logs. It is still based on the %d pattern to retain 60 days of logs. So in this case you would have any number of files potentially created across those 60 days. The same applies to the hours based example if "%i" is added to the fileNamingPattern. in order to protect disk usage you will often see the following used: <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy"> <fileNamePattern>${org.apache.nifi.bootstrap.config.log.dir}/nifi-app_%d{yyyy-MM-dd_HH}.%i.log</fileNamePattern> <maxFileSize>100MB</maxFileSize> <maxHistory>30</maxHistory> <totalSizeCap>10GB</totalSizeCap> </rollingPolicy> What this does is create hourly logs that will incrementally role during each hour if the log file reaches 100MB in size. Then to prevent excessive disk usage should logging unexpectedly increase in log generation, the "TotalSizeCap" will start deleting the oldest rolled logs until max usage is below 10GB. This may mean that 30 hours of logs are no longer retained, but you avoid disk usage issues when the unexpected happens. Hopefully this clears this up for you. NOTE: Above is based on the "SizeAndTimeBasedRollingPolicy". LogBack as other rolling Policy providers. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎12-22-2023

@Rohit1997jio Your images shared above are from CFM 2.1.5. HDF 3.3.0 was released way back in 2017. I strongly encourage you to upgrade away form such an old version for security and bug fixes reasons. That being said, you could use the ExecuteScript processor to manually penalize a FlowFile. A penalized FlowFile will be ignored by the downstream processor until the penalty duration has lapsed. on the "setting" tab set the desired 10 sec "Penalty Duration" you want applied to every FlowFile the ExecuteScript processor executes against. In the "properties" tab select "Groovy" as the "Script Engine" property value. In the "properties" tab add the following script to the "Script Body" property: FlowFile flowFile = session.get() if (flowFile == null) { return; } session.transfer(session.penalize(flowFile), REL_SUCCESS) Use the ExecuteScript processor in place of your "ControlRate" processor in your dataflow to apply the 10 second penalty to all FlowFiles being looped for retry. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎12-22-2023

@ravi_tadepally NiFi introduced the ability for the NiFi web server to auto scan and reload the keystore and truststore configured in the nifi.properties file. This auto-scan and reload is not something Apache NiFi has extended to the StandardSSLContextService controller service. The reason for this is due to complexity here versus the web UI. There are many processors and controller service components that can be configured to with a dependency on the StandardSSLContextService. Each of these components uses its own client which may handle the StandardSSLContextService abstraction differently. Keep in mind the SSLContextServixe was created to simplify management of your dataflows by restricting/limiting access to keystores and truststores, centralizing keystore and truststores for reuse by multiple components, etc. rather then requiring keystore and truststore properties being set in every individual component. Theses individual client libraries would need to be able to auto-scan for keystore and truststore changes. Since many of these client libraries are not written specifically for Apache NiFi, this could be challenging to implement. So as things exist now, changes to the SSLContextService keystore and trustsore will require a disable and enable of the controller service. When the SSLContextService is disable, it triggers the stopping of the dependent components using that controller service. Then enabling controller service (with option to start referencing components) would start all those clients again triggering them to read the updated keystore and truststore files. I could not find an Apache NiFi jira (https://issues.apache.org/jira/) for adding an auto-reload to the StandardSSLContextService, but encourage you to do so. Perhaps the community developers can come up with a way to abstract this from the various clients to make it possible in some future release. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎12-21-2023

@Rohit1997jio Most NiFi processors that are "started/running" are not continuously consuming threads. The processor configurable scheduling controls how often the component processor requests a thread from the NiFi controller thread pool to execute it's code. By default processors have a run schedule of 0 which means that the processor would get scheduled to run as often as possible. Should execution result in no processing of data (FlowFiles), NiFi controller will delay next scheduled execution of that processor for 1 sec before it can be scheduled again (this logic prevents excessive CPU usage by processor components). The Kafka processors simply execute Kafka Client libraries and support adding dynamic properties to the processors configuration that impact that client library behavior. The Apache NiFi documentation for each component will identify if "Dynamic Properties" are supported: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-kafka-2-6-nar/1.24.0/org.apache.nifi.processors.kafka.pubsub.ConsumeKafka_2_6/index.html In the case of the consume and publish Kafka component processors, dynamic properties are supported. The dynamic properties are limited to those allowed by the specific kafka client library version: https://kafka.apache.org/documentation.html#configuration If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎12-20-2023

@Rohit1997jio Need to understand your use case more.... Agreed with @SAMSAL that the Wait and Notify processors are not what you want to use here. ControlRate processor may also not be your best option here. assume you have multiple FlowFiles all route to failure. All of those will reach that controlRate processor at about same time. First goes right through if it has been 10 seconds since last Flowfile passed through it (so first is not delayed 10 seconds at all). All FlowFiles queued behind allowing 1 to pass through every 10 seconds. So with multiple FlowFiles you now have an even larger gap between processing. I see that you are using CFM 2.1.5 based off Apache NiFi 1.1.8. In this version of NiFi the ability to handle retry on a relationship is built in to the processors framework. This new feature is more powerful than the multiple processor method you are using now in the dataflow you shared above. Your flow requires more processors and more resource usage. There is also no way to guarantee a exact 10 second delay. So hope this is a soft requirement. Open the configuration for the PublishKafka processor and select the "Relationships" tab. On the "failure" relationship you have the ability to check the box for "retry". when selected, additional configuration parameters appear. Here you can specify the number of retry attempts to make before the FlowFile is routed to the "failure" relationship. You also have the ability to control how this processor behaves when retry is happening: - Penalize (set/apply the penalty defined on the setting tab of processor to this FlowFile. This FlowFile will not be retried until penaltly expires. During this penalty duration processor will continue to process other FlowFiles.) - Yield (Same as penalize EXCEPT, no other FlowFiles will get processed until this FlowFile is sucessfully processed or routed to failure relationship. Useful when processing order of FlowFiles i important). What this built in retry will not do is apply same penalty to every retry. Let's say you set "penalty duration" to 10 secs in setting tab. Then you configure "number of retry attempts" to 3. First failure results in FlowFile being penalized 10 secs, second attempt is penalized 20 secs, and third attempt is penalized 40 secs. This incremental back-off help reduce resource usage when you have continuous failure (something else you flow will not do). Once all attempts to retry are exhausted without success, FlowFile would rout to failure where you could use LogMessage and send failed FlowFile to different processing path like you did in your dataflow. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎12-13-2023

@edim2525 Apache NiFi is designed to move data fast and efficiently. Each processor consumes FlowFiles from an incoming connection with no guaranteed priority/order, but depending on the processors used and concurrency setting, that does not guarantee order of FlowFiles execution. Nor can order be guaranteed when FlowFiles are spread across nodes in a multi-node NiFi cluster. Now you did not mention if you are a single standalone instance of NiFi or a NiFi cluster. You also do not mention if you have other NiFi component processors between your consumeKafka and PublishKafka processors. The ConsumeKafka processor consumes messages from a Topic and produces one FlowFile for each message. For most efficient processing the ConsumeKafka processor would typically be have its concurrent tasks configured to equal or be less than number of partitions on the topic (never more as that causes constant rebalance). This creates a consumer group allowing each concurrent thread execution to pull data from specific partitions assigned to the consumer group. As an example: If you had a 3 partition topic and a 3 node NiFi cluster, you would configure your ConsumeKafka with 1 concurrent task (this is a total of 3 concurrent tasks across yoru 3 NiFi nodes). Each NiFi node would would be assigned a different topic partition to consume from. Now each node in a NiFi cluster executes its own copy of the flow with no knowledge of or regards to what FlowFiles are be processed on other nodes. This is why guaranteed order in a NiFi cluster is very difficult. There are options to run some dataflows on Primary node only keeping that dataflow running on only one node. But even then you can have the primary node election change and NiFi has no way to force or guarantee a specific nodes is assigned or retains the primary node role. Somethings you may try that might work for you... Suggestion 1: 1. Standalone NiFi instance (removes issues that a cluster can introduce when order is important) 2. When using ConsumeKafka, try using EnforceOrder processor after ConsumeKafka configured to order FlowFiles by the offset. Then make sure that all connections downstream of EnforceOrder processor to PublishKafka use "FirstInFirstOutPrioritizer" (by default no prioritizer is set on a connection). 3. Make sure all processors in this dataflow are configured with only 1 concurrent task to avoid any concurrency that could lead to some FlowFile being processed faster then other impacting queued order in downstream connections. Suggestion 2: 1. Standalone NiFi instance for same reason as above. 2. Use ConsumeKafkaRecord instead of ConsumeKafka. This produce fewer FlowFiles with each FlowFile containing multiple messages. 3. Use PublishKafkaRecord instead of PublishKafka 4. Make sure that all connections downstream of PublsihKafkaRecord processor to PublishKafkaRecord processor use "FirstInFirstOutPrioritizer" (by default no prioritizer is set on a connection). 5. still use one concurrent task on all processors to avoid concurrent record production. --- This option leads to fewer FlowFile (meaning less overhead and lower chance of messages becoming out of order) and likely faster throughput. Suggestion 3: If your dataflow consists of nothing more than consume and publish and you aer doing nothing with the data between these processor, as @steven-matison mentioned, NiFi may not be the best choice here. NiFi is designed around faster and efficient movement of data using multi-node clusters and concurrent thread execution. Additionally NiFi needs to consume all the messages in to FlowFiles (writing content to content_repository and metatdata to the flowfile_repository) and records data provenance which adds to overhead. So a purpose built tool for a simple replication task may be better for this use case. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎12-13-2023

@alexduta NiFi's authorization policies control what users can and can't interact with within the NiFi UI. If the user has been granted "modify the component" or "operate the component" policies on the processors or process group, the user will be able to initiate a start or stop of the process group. Of course the user needs "view the component" and "modify the component" on a process group in order to even add components to the process group. Apache NiFi does not provide a way to disable start/stop from the process group level through any configuration file properties. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎12-11-2023

@Knowledgeknow You can't enable authentication and authorization on an unsecured NiFi (HTTP). To enable security in NiFi, step one is to configure HTTPS (This will require you to have certificates for all your NiFi nodes). The following configuration files have configurations related to securing your NiFi. nifi.properties --> (framework configuration file has bits related to authentication and authorization). You would enable security on your NiFi by configuring HTTPS. Once NiFi is configured with an HTTPS port authentication via TLS certificates is enabled (Can NOT be disabled and is always first method attempted to authenticate a user/client). Teh following section of this file pertain to security: Security Properties Identity Mapping Properties OpenID Connect - Since you mention Oauth2 and others... Login-identity-providers.xml --> (authentication related) used if you want to enable user authentication support through ldap or kerberos. To enable the ldap-provider or kerberos -provider, you'll need to specify one or the other in the nifi.properties configuration property: "nifi.security.user.login.identity.provider". Out-of the-box NiFi has this configured to use the Single-User-Provider (not intended for production use). Lightweight Directory Access Protocol (LDAP) - Since you mention LDAP base authentication. Once you have decided on your authentication method of choice, you'll need to setup Multi-Tenant Authorization. Authorization is used to control what your various successfully authenticated users/client have access to within NiFi's UI. This gets configured in the authorizers.xml (order in which you add various providers to this configuration file is very important!!!). This file consists of only one Authorizer (out of the box it uses the single-user-authorizer. The "authorizer" is always at the very bottom of the authorizers.xml.file. Below is a very common example structure (top to bottom order of providers added to file: FileUserGroupProvider LdapUserGroupProvider Composite Implementations - You'll want use "CompositeConfigurableUserGroupProvider". This is then configured to use both above UserGroupProviders. FileAccessPolicyProvider - configured to use above "composite-user-group-provider". StandardManagedAuthorizer - configured to use above "file-access-policy-provider" Example configuration of above: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#composite-file-and-ldap-based-usersgroups The authorizers.xml will setup the initial required properties for the user/client you define in the fileAccessPolicyProvider "Initial Admin Identity" (the user identity configured in this provider must be returned by ONLY one of the configured UserGroupProviders. So do NOT configure the initial admin identity in the FileUserGroupProvider if that identity is going to be returned by the LDAPUserGroupProvider. Don't worry if you mess up here initially, just delete the users.xml (FileUserGroupProvider generated) and authorizations.xml (FileAccessPolicyProvider generated) files and on next startup they will be created again. Once you have a working authentication and authorization setup, you will be able to define authorizations, using your InItial Admin user, for your other synced directly through the NiFi UI. You can also define additional authorization for your admin user (is not given access to everything, but is given admin authorization which means this user can set new authorizations for all user including itself. If you run it to authorization issue after setup, you'll want to inspect the nifi-user.log. This log will show the exact case sensitive user/client identity. If it does not match exactly with the identity that was returned by the authorizer UserGroupProviders, you'll need to go back and make some configuration changes until they do. Have fun in your journey.... If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎12-11-2023

@Knowledgeknow You can't enable authentication and authorization on an unsecured NiFi (HTTP). To enable security in NiFi, step one is to configure HTTPS (This will require you to have certificates for all your NiFi nodes). The following configuration files have configurations related to securing your NiFi. nifi.properties --> (framework configuration file has bits related to authentication and authorization). You would enable security on your NiFi by configuring HTTPS. Once NiFi is configured with an HTTPS port authentication via TLS certificates is enabled (Can NOT be disabled and is always first method attempted to authenticate a user/client). Teh following section of this file pertain to security: Security Properties Identity Mapping Properties OpenID Connect - Since you mention Oauth2 and others... Login-identity-providers.xml --> (authentication related) used if you want to enable user authentication support through ldap or kerberos. To enable the ldap-provider or kerberos -provider, you'll need to specify one or the other in the nifi.properties configuration property: "nifi.security.user.login.identity.provider". Out-of the-box NiFi has this configured to use the Single-User-Provider (not intended for production use). Lightweight Directory Access Protocol (LDAP) - Since you mention LDAP base authentication. Once you have decided on your authentication method of choice, you'll need to setup Multi-Tenant Authorization. Authorization is used to control what your various successfully authenticated users/client have access to within NiFi's UI. This gets configured in the authorizers.xml (order in which you add various providers to this configuration file is very important!!!). This file consists of only one Authorizer (out of the box it uses the single-user-authorizer. The "authorizer" is always at the very bottom of the authorizers.xml.file. Below is a very common example structure (top to bottom order of providers added to file: FileUserGroupProvider LdapUserGroupProvider Composite Implementations - You'll want use "CompositeConfigurableUserGroupProvider". This is then configured to use both above UserGroupProviders. FileAccessPolicyProvider - configured to use above "composite-user-group-provider". StandardManagedAuthorizer - configured to use above "file-access-policy-provider" Example configuration of above: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#composite-file-and-ldap-based-usersgroups The authorizers.xml will setup the initial required properties for the user/client you define in the fileAccessPolicyProvider "Initial Admin Identity" (the user identity configured in this provider must be returned by ONLY one of the configured UserGroupProviders. So do NOT configure the initial admin identity in the FileUserGroupProvider if that identity is going to be returned by the LDAPUserGroupProvider. Don't worry if you mess up here initially, just delete the users.xml (FileUserGroupProvider generated) and authorizations.xml (FileAccessPolicyProvider generated) files and on next startup they will be created again. Once you have a working authentication and authorization setup, you will be able to define authorizations, using your InItial Admin user, for your other synced directly through the NiFi UI. You can also define additional authorization for your admin user (is not given access to everything, but is given admin authorization which means this user can set new authorizations for all user including itself. If you run it to authorization issue after setup, you'll want to inspect the nifi-user.log. This log will show the exact case sensitive user/client identity. If it does not match exactly with the identity that was returned by the authorizer UserGroupProviders, you'll need to go back and make some configuration changes until they do. Have fun in your journey.... If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎12-01-2023

@ChuckE You can right click on the UpdateAttribute processor and "view state" which will allow you to clear the stored state for that processor without needing to create a new processor. Every dynamic property you add becomes a different local state variable that is added to the FlowFiles as a FlowFile Attribute. Thanks, Matt

Online	Offline
Last Visited	‎01-13-2026 11:14 AM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎01-13-2026 11:14 AM
Posts	3,421
Kudos received	1620

Cloudera Community

Re: Nifi 2.7.2 Start Problem

Re: Error importing NiFi workflow template from ve...

Re: Error importing NiFi workflow template from ve...

Re: How to elevate a default nifi user to admin - ...

Re: NiFi EnvokeHTTP - putting current date on HTTP...

Re: Nifi Log Rotation

Re: how to add delay for 10 seconds while retrying...

Re: NIFI StandardSSLContextService - Update Keysto...

Re: is there any idle connection timeout for Consu...

Re: how to add delay for 10 seconds while retrying...

Re: NIFI - ConsumeKafka to PublishKafka doesn't ke...

Re: Blocking start/stop on the entire process grou...

Re: how NiFI running on http can be configured wit...

Re: NiFI on Http, enable IAM via oauth2 or ldap

Re: Setting Initial Value of Stateful Variables in...