About MattWho

MattWho · ‎07-07-2022

@Luwi Yes, you made add it manually to the nifi.properties file in 1.16.3. NiFi will not read that new property until you restart the service. Did you upgrade from a previous NiFi release? Not sure why it is missing. Matt

MattWho · ‎07-07-2022

@Meepoljd You'll want to have https enabled to prevent access to NiFi's endpoints directly. When NiFi is not secured (HTTPS), it does not require user authentication or authorization. Thus access is treated as anonymous. When using Apache Knox, NIFi can not be configured with other login based authentication like a login-provider in the login-identity-providers.xml or OpenID or SAML via associated properties in the nifi.properties file. So make sure these properties are not configured in the nifi.properties file when you have also configured the knox properties: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#saml https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#openid_connect and following login provider property: nifi.security.user.login.identity.provider= If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎07-07-2022

@Reddy3 There is not enough history or detail to understand what lead to the corruption of your NiFi HD DBs. However, the NiFi H2 DBs are used to hold flow configuration history (what user did what on the canvas) and authenticated user server side tokens. These DBs can be removed/renamed and on startup NiFi will created new DBs. Removing these will have no impact on your flow.xml, content, etc... You will simply lose your flow configuration history (will start over from this point forward tracking changes) and your users will be forced to login again. Hope this helps, Matt

MattWho · ‎07-05-2022

@pk87 Also consider that using timer driven may not always give you 60 minute scheduling. When using timer driven the component will get scheduled upon start and then again x configured amount of time later. A NiFi restart or stopping and starting the processor will reset this. If you need to make sure that a component is only scheduled every x amount of time consistently, you should be using cron driven scheduling strategy which will allow you to set specific time of schedule. Thanks, Matt

MattWho · ‎07-05-2022

@pk87 Above was discussed in another post here: https://community.cloudera.com/t5/Support-Questions/NIFI-I-m-getting-issue-like-quot-authentication-details-were/td-p/346593 A configuration change was provided to resolve the http 400 response. The new response was an HTTP 401 so request was good, but a end point did not like the authorization provided in the request. Thanks Matt

MattWho · ‎07-01-2022

@Brenigan Are you running your dataflow on a standalone NiFi install or a NiFi cluster install? If a multi node NiFi cluster, are all 200 FlowFiles on the same NiFi node? Does your partition_number start at 0? Do you see your FlowFiles getting routes to the overtook relationship after 10 minutes? Assuming all the following: 1. All FlowFiles are on same NiFi node 2. partition_number starts at "0" and "increments consistently by "1" 3. All FlowFiles have same filename 4. Wait relationship is route via a connection back to the EnforceOrder processor. You should be seeing: 1. All FlowFiles routed to the "wait" relationship until a FlowFile with attribute "partition_number" equal to "0" is processed which will result in that FlowFile routing to success. 2. Other FlowFiles meeting above 4 criteria will continue to loop through wait until "partition_number" attribute with value "1" is seen and routed to success. 3. If a FlowFile in incremental order is missing, all FlowFiles with a partition_number higher than the next expected integer will continue to route to wait relationship. 4. after the configured "wait timeout" any FlowFile that has been waiting this long will be routed to the "overtook" relationship. You can right click on a connection holding the FlowFiles and list the queue. From there you can select the "view details" icon to the far left to examine the FlowFiles current attributes. You should see a new attribute "EnforceOrder.expectedOrder" that contains the next expected integer value that the group this FlowFile belongs to is waiting for. You will also find your "partition_number" which will have the current integer for this FlowFile. If you have your FlowFiles distributed across multiple nodes in a NiFi cluster, you will need to get all FlowFiles with the same "group identifier" moved to the same NiFi node in order to enforce order (you can not enforce order across different nodes in a NiFi cluster). You can accomplish this by editing the connection feeding your enforceOrder processor and under settings select a "Load Balancing Strategy" of "Partition by Attribute" using the "filename" attribute that you are using as your group identifier in the Enforce Order processor. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎06-30-2022

@araujo Once "nifi.content.repository.archive.enabled=false" is set to false, content claims that no longer have any FlowFiles referencing them will no longer get moved to "Archive" sub-directories. They instead simply get removed. The logic built around the backpressure checks to see if there is archived content claims still and if none, allows content to continue to be written until disk is full. If the archive claimant count is not zero, backpressure kicks in until that count goes to zero through archive removal. This backpressure mechanism is in place so that the archive clean-up process can catch-up. The fact that NiFi will allow content repo writes until disk full is why it is important that users do not co-locate any of the other NiFi repositories on the same disk as the content repository. If disk filling became an issue, the processors that write new content would not just stop executing. They would start throwing exceptions in the logs about insufficient disk space. @Drozu The original image shared of your canvas shows a MergeContent processor in a "running" state , but does not indicate an active thread at time of image capture. An active thread shows as a small number in the upper right corner of the processor. The processor image all shows that this MergeContent processor also executed 2,339 threads in just the last 5 minutes. Execution does not mean an output FlowFile will always be produced. If none of the bins are eligible for merge, then nothing is going to be output. When the processor is in a state of "not working" do all that processor stats go to 0 including the "Tasks/Time"? Does it also at that same time indicate a number in its upper right corner? This would indicate that the processor has an active thread that has been in execution for over 5 minutes. In a scenario like this, it is best to get a series of ~4 NiFi thread dumps to see why this thread is running so long and what it is waiting on. If the stats go to zeros and you do not see an active thread number on the processor, this indicates the processor is not getting a thread in last 5 minutes from the Timer Driven thread pool. Then you need to look at thread pool usage per node. Is the complete thread pool in use by other components? Thanks, Matt

MattWho · ‎06-29-2022

@Drozu Switching off the content repository archiving would not result in an automatic clean-up of your archived content claim. Make sure that all the "archive" sub-directories in the numbered directories within the content-repository are empty. After disabling archive, any change in disk utilization on your 3 nodes? Did content repository disk fill to 100%? There are many things that go into evaluating performance of your NiFi and its dataflows. Anytime you add new components via the NiFi canvas, the dynamics can change. How many components are running? (if all 50 timer driven threads are currently in use by other components, other components will just be waiting for an available thread) How often is JVM garbage collection (GC) happening? How many timer driven threads are in use at time processors seems to stop? How are the queued FlowFiles to the MergeContent distributed across your 3 nodes? How many concurrent tasks on MergeContent? What do the Cluster UI stats show for per node thread count, GC stats, cpu load average, etc.? Any other WARN or ERROR log output going on in the nifi-app.log. (Maybe related to OOM or Open File limits for example)? Looks like you are using your mergeContent processor to merge two FlowFiles together that have the same filename attribute value. Does one inbound connection contain 1 FlowFile and the other contain the other FlowFile in the pair? The MergeContent is not going to parse through the queued FlowFiles looking for a match. How are you handling "Failure" with the MergeContent? It round robins each connection, so in execution A, it reads from connection 1 and bins those FlowFiles. Then on next execution, it reads from connection B. Try adding a funnel before your MergeContent and redirecting your two source success connection in to that funnel and dragging a single connection from the funnel to the MergeContent. Thank you, Matt

MattWho · ‎06-29-2022

@ajignacio You should carefully read all the migration guidance leading up to 1.16 starting with: Migrating from 1.x.x to 1.10.0 Take special note of: 1. Any nars that may have been removed and make sure your dataflows are not using any processors from those removed nars. 2. Any reported changes to specific components you may use in your dataflows. 3. Check that your dataflow does not have any processors with inbound connection scheduled to execute on "Primary Node" only (small P in upper left corner of processor). 4. Take note of migration step involving sensitive .props.key. If you had not set one previously, you may want to use the nifi toolkit to create a new user defined one and re-encrypt the sensitive property values in the flow.xml.gz using that new sensitive props key. 5. Make sure you upgrade to a java 8 NiFi 1.16 supported Java version before migration. NOTE: While Apache NiFi has limits on the maximum size fo the service forcing deprecation of older nars, Cloudera's CFM distributions of Apache NiFi do not and include almost all Apache nars in addition to Cloudera specific nars. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎06-29-2022

@rafy Each node in a NiFi Cluster has its own copy of the dataflow and executes independently of the other nodes. Some NiFi components are capable of writing cluster state to zookeeper to avoid data duplication across nodes. Those NiFi ingest type components that support this should be configured to execute on primary node only. In a NiFi cluster, one node will be elected as the primary node (which node is elected can change at any time). So if a primary node change happens, the same component on a different node will now get scheduled and will retrieve cluster state to avoid ingesting same data again. Often in these types of components, the one that records sate does not typically retrieve the content. It simply generates metadata/attributes necessary to later get the content with the expectation that in your flow design you distribute those FlowFiles across all nodes before content retrieval. For example: - ListSFTP (primary node execution) --> success connection (with round robin LB configuration) --> FetchSFTP (all node execution) The ListSFTP creates a 0 byte FlowFIle for each source file that will be fetched. The FetchSFTP processor uses that metadata/attributes to get the actual source content and add it to the FlowFile. Another example your query might be: GenerateTableFetch (primary node execution) --> LB connection --> ExecuteSQL The goal with these dataflows is to void having one node ingest all the content (added network and Disk I/O) only to then add more network and disk I/O to spread that content across all nodes. So instead we simply get details about the data to be fetched so that can be distributed across all nodes, so each nodes gets only specific portions of the source data. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

Online	Offline
Last Visited	‎07-13-2026 11:26 PM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎07-13-2026 11:26 PM
Posts	3,472
Kudos received	1638

Cloudera Community

Re: ListenNetFlow processor does not decode Cisco ...

Re: Can we detect who did a particular operation i...

Re: How to invoke a url in nifi which is protected...

Re: Retry impacts scheduler

Re: 503 error while copying/versioning big process...

Re: Unable to write flowfile content to content re...

Re: Whether SSO can be used without enabling HTTPS...

Re: Error creating bean with name 'org.springframe...

Re: Regarding NIFI Timer driven scheduling not wo...

Re: Regarding NIFI Timer driven scheduling not wo...

Re: EnforceOrder processor doesn't work.

Re: Problem with Merge Content Processor after swi...

Re: Problem with Merge Content Processor after swi...

Re: Nifi Upgrade

Re: Can NIFI nodes access different records on a D...