Member since
07-30-2019
3406
Posts
1623
Kudos Received
1008
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 322 | 12-17-2025 05:55 AM | |
| 383 | 12-15-2025 01:29 PM | |
| 367 | 12-15-2025 06:50 AM | |
| 358 | 12-05-2025 08:25 AM | |
| 600 | 12-03-2025 10:21 AM |
12-13-2022
06:35 AM
@MaarufB You must have a lot of logging enabled that you expect multiple 10MB app.log files per minute. Was NiFi ever rolling files? Check your NiFi app.log for any Out of Memory (OOM) exceptions. Does not matter what class is throwing the OOM(s), once the NiFi process is having memory issues, it impacts everything within that service. If this is the case, you'll need to make changes to your dataflow(s) or increase the NiFi heap memory. Secondly, check to make sure you have sufficient file handles for your NiFi process user. For example; - If your NiFi service is owned by the "nifi" user, make sure the open file limit is set to a very large value for this user (999999). A restart of the NiFi service before the change to file handles will be applied. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-12-2022
06:01 AM
@Onkar_Gagre The Max Timer Driven Thread pool setting is applied to each node individually. NiFi nodes configured as a cluster ate expected to be running on same hardware configuration. The guidance of 2 to 4 times the number of cores as starting point is based on the cores of a single node in the cluster and not based off the cumulative cores across all NiFi cluster nodes. You can only reduce wait time as you reduce load on the CPU. In most cases, threads given out to most NiFi processors execute for only milliseconds. But some processors operating against the content can take several seconds or much longer depending on function of the processor and/or size of the content. When the CPU is saturated these threads will take even longer to complete as the CPU is giving time to each active thread. Knowing the only 8 threads at a time per node can actually execute concurrently, a thread only gets a short duration of time before giving some to another. The pauses in between are the CPU wait time as thread queued up wait for their turns to execute. So reducing the max Timer Driven Thread count (requires restart to reduction to be applied) would reduce maximum threads sent to CPU concurrently which would reduce CPU wait time. Of course the means less concurrency in NiFi. Sometimes you can reduce CPU through different flow designs, which is a much bigger discussion than can be handle efficiently via the community forums. Other times, your dataflow simply needs more CPU to handle the volumes and rates you are looking to achieve. CPU and Disk I/O are the biggest causes of slowed data processing. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-09-2022
02:00 PM
@F_Amini @Green_ Is absolutely correct here. You should be careful when increasing concurrent tasks as just blindly increasing it everywhere can have the opposite effect on throughput. I recommend stopping setting the concurrent tasks back to 1 or maybe 2 on all the processor where you have adjusted away from the default of 1 concurrent task. Then take a look at the processor further downstream in your dataflow where it has a red input connection but black (no backpressure) outbound connections. This processor s @Green_ mentioned is the processor causing all your upstream backlog. Your'll want to monitor your CPU usage as you make small incremental adjustments to this processors concurrent tasks until you see the upstream backlog start to come down. If while monitoring CPU, you see it spike pretty consistently at 100% usage across all your cores, then your dataflow has pretty much reached the max throughput it can handle for yoru specific dataflow design. At this point you need to look at other options like setting up a NiFi cluster where this work load can be spread across multiple servers or designing your datafow differently with different processors to accomplish same use case that may have a lesser impact on CPU (not always a possibility). Thanks, Matt
... View more
12-09-2022
01:28 PM
@Onkar_Gagre Let's take a look at concurrent task here.... You have a an 8 core machine. You have a ConsumeKafka configured with 8 concurrent tasks and 4 nodes. I hope this means your Kafka topic has 32 partitions because that processor creates a consumer group with the 8 consumers from each node as part of that consumer group. Kafka will only assign one consumer from a consumer group to 1 partition. So having more consumer then partitions gains you nothin, but can cause performance issues caused by rebalance. Then you have a QueryRecord with 40 Concurrent tasks per node. Each allocated thread across your entire Dataflow needs time on the CPU. So just between these two processor alone, you are scheduling up to 48 concurrent threads that must be handled by only 8 cores. Based on your description of data volume, it sounds like a lot of CPU wait when enable this processor as each thread is only get a fraction of time on the CPU and thus taking long to complete its task. Sounds like you need more Cores to handle your dataflow and not necessarily an issue specific to the use of the QueryRecord processor. While you maybe scheduling concurrent tasks too high for your system on the QueryRecord processor, The scheduled thread come from the Max Timer Driven Thread pool set in yoru NiFi. The default is 10 and I assume you increased this higher to accommodate the concurrent tasks you have been assigning to your individual processors. The general starting recommendation for the Max Timer Driven Thread pool setting is 2 to 4 Times the number of cores on your node. So with an 8 core machine that recommendation would be 16 - 32. The decision/ability to set that even higher is all about your dataflow behavior along with your data volumes. It requires you to monitor cpu usage ad adjust the pool size in small increments. Once CPU is maxed there is nothing much we can do with create more CPU. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-09-2022
01:09 PM
@Techie123 Can you provide more detail around your requirement for "the FFs order is also important". My initial thought here would be a two phase merge. In the first Merge you utilize a correlation FlowFile attribute you create on each FlowFile based off the employees ID extracted from the record. Setting min number of entries to 7 and max to 10. Then you take these employee merged records and merge them together in to larger FlowFiles using MergeRecord. The question is if 100 records per FlowFile is a hard limit or not which it does not. The MergeRecord processor Max number of records is soft limit. Let's assume we set this to 100. So lets say one of your merged employee records comes to the MergeRecord and has 7 records in it for that employee ID, yet the bin already has 98 records in it. Since bin min has not been met yet, this merged FlowFile still gets added and results in merged FlowFile with 105 records. If you must keep it under 100 records per FlowFile set the max records to 94. If at end of adding a set of merged employee records it is less than 94 another merge employee record would be added and since you stated each set of merged employee records could be up to 7, this keeps you below or at 100 in that single merged record. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-06-2022
12:48 PM
@Onkar_Gagre 1. What is the CPU and Memory usage of your NiFi instances when the QueryRecord processor is stopped? 2. How is your QueryRecord processor configured to include scheduling and concurrent task configurations? What other processors were introduced as part of this new dataflow? 3. What does disk I/O look like while this processor is running? NiFi documentation does not mention any CPU or Memory specific resource considerations when using this processor. Thanks, Matt
... View more
12-05-2022
11:33 AM
1 Kudo
@Ghilani NiFi stores Templates in the flow.xml.gz file. the flow.xml.gz is just a compressed copy of dataflow(s) which reside inside NiFi's heap memory while NiFi is running. It is not recommended to keep templates in your NiFi. NiFi templates are also deprecated and will go away in next major release. It is recommended to use NiFi-registry to store version controlled flows. If not using NiFi-Registry, Flow definitions should be downloaded instead of creating templates and stored safely somewhere outside of NiFi itself. A flow definition can be downloaded by right clicking on a process group in NiFi and selecting "Download flow definition". This json file will be generated of that flow and downloaded. Flow definitions can be uploaded to NiFi by dragging the create Process Group icon to the canvas and selecting option to upload flow definition. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-05-2022
11:19 AM
1 Kudo
@dreaminz You can create variables on a process group, those variables are then only available to that process group (scope) on which they were created. NiFi documentation on Variables: https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#Variables Variable shave been deprecated in favor of Parameter Contexts: https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#parameter-contexts You can create a single parameter context that you add parameters to and then associate the parameter context with multiple process groups. This will allow you to update a parameter in one parameter context and effectively update your flows in multiple process groups. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-05-2022
11:09 AM
@grb Your QueryDatabaseTable processor is failing because the dependent controller service is not yet enabled. It appears that controller services is still trying to enable (enabling) because the SQLServerDriver you have configured in that controller service is not compatible with the Java JDK version you are using to run NiFi. What version of NiFi are you using? What version Java is your NiFi using? I recommend updating your Java version to the most recent version of Java JDK 8 or Java JDK 11 (Version 11 only supported in NiFi versions 1.10+). Otherwise, you'll need to find an older version of your SQ driver. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-05-2022
10:56 AM
@ajignacio That was a big jump from version 1.9.x to 1.16.x of NiFi. NiFi's data provenance stores, for a configurable amount of time, information about NiFi FlowFiles as they traverse the various processors in your dataflow(s). Over the releases of NiFi both improvements and new implementations of provenance have been introduced. The original version of provenance was org.apache.nifi.provenance.PersistentProvenanceRepository which has since been deprecated in favor of a better performing provider class org.apache.nifi.provenance.WriteAheadProvenanceRepository which is the new default. The following properties from the nifi.properties file are used to configure the provenance repository: nifi.provenance.repository.implementation=org.apache.nifi.provenance.WriteAheadProvenanceRepository
nifi.provenance.repository.directory.default=./provenance_repository
nifi.provenance.repository.max.storage.time=30 days
nifi.provenance.repository.max.storage.size=10 GB. (use to be 1 GB)
nifi.provenance.repository.rollover.size=100 MB
nifi.provenance.repository.query.threads=2
nifi.provenance.repository.index.threads=2
nifi.provenance.repository.compress.on.rollover=true
nifi.provenance.repository.always.sync=false
nifi.provenance.repository.indexed.fields=EventType, FlowFileUUID, Filename, ProcessorID
nifi.provenance.repository.indexed.attributes=
nifi.provenance.repository.index.shard.size=100 MB
nifi.provenance.repository.max.attribute.length=65536
nifi.provenance.repository.concurrent.merge.threads=2
nifi.provenance.repository.warm.cache.frequency= For details on these properties, here is Apache NiFi documentation section: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#provenance-repository The good news is that data provenance retention has no direct relationship to the active FlowFiles traversing your dataflow(s) currently. This means that you can shutdown your NiFi, purge the contents of the current <path to>/provenance_repository directory, adjust the configuration properties as you want, and then restart your NiFi. NiFi will build a new provenance repository on startup. Considering that NiFi only provides limited configurable space (1GB original default to 10GB current default) and age (30 days) as the defaults, you would not be losing much if you were to reset. I am also concerned that the path in the error suggests you also created your original provenance_repository within a subdirectory of the FlowFile_repository which I would not recommend. I would strongly suggest not writing the contents of any one of the four NiFi repositories within each other. Considering the flowfile_repository and content_repository are the two most important repositories for tracking your actively being processed FlowFiles in your dataflow(s), I suggest these each be on their own path and reside on dedicated disk backed by RAID to avoid data loss in the event of a disk failure. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more