Member since
07-30-2019
3472
Posts
1642
Kudos Received
1020
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 209 | 06-03-2026 06:06 PM | |
| 502 | 05-06-2026 09:16 AM | |
| 961 | 05-04-2026 05:20 AM | |
| 566 | 05-01-2026 10:15 AM | |
| 675 | 03-23-2026 05:44 AM |
10-17-2018
12:31 PM
@pavan srikar I should add that there is no processor that will specifically clone a FlowFile to every node in the NiFi cluster. - But there are other options if you do not want to standup an external map cache server. - Perhaps setting up a disk mount that is shared across all nodes. On Primary node only you run a flow that retrieves a new token every ~55 minutes writes it to this shared mounted directory set to overwrite previous written token each time. Then on all nodes you could create a flow that consumes this token without deleting it on schedule to perform your all node tasks. - Just a second option for you. - Thank you, Matt
... View more
10-17-2018
12:23 PM
1 Kudo
@pavan srikar - The design you have in place looks to be correct solution based on your described use case here. Every node in your cluster runs the exact same flow.xml.gz - You would typically configure your "PutDistributedMapCache" and "FetchDistributedMapCache" processors to use a "Distributed Cache Service" that every node has access to. - This allows you run a single "primary node" only flow that retrieves the token based on a one hour cron and writes it to the distributed Map cache and then have a second flow that every node runs that pulls that stored token value from the distributed map cache and uses it for your downstream calls. - Using the "RedisDistributedMapCacheClientService" controller service for example allows you to set a TTL on the values you store in the cache. This allows you to expire the stored token before it is no longer valid. For example token is good for 1 hour, so you could set TTL to 50 - 55 minutes. - Thank you, Matt - If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.
... View more
10-09-2018
02:09 PM
1 Kudo
@Cooper Max NiFi has two processes as you see above that are running. The NiFi bootstrap process is what is kicked off when starting NiFi and it then spans off the main NiFi process. The bootstrap process then monitors for the pid of that main process and if it disappears, the output you see above is thrown and the bootstrap then attempts tp restart the main process. - If your nifi-app.log is not exhibiting any signs of issues going on in your dataflow leading up to this event, the killing of this NiFi process is being triggered external to NiFi. - Most commonly you may find that the server itself has killed the process. I would suggest looking at your server logs for the execution of "OOM killer". When memory resources on a server reach usage a level where the OS feels it could result in the server become unresponsive or crash, oom killer is launched which evaluates the running process and elects a process to be killed to free memory to protect the OS. Considering the memory footprint of a typical main NiFi JVM process, it is commonly selected by the oom killer. - To resolve this issue, you would need to reduce the amount of memory that is being consumed by running process on this same server. - Do not run NIFi on server where other service are co-located - Reduce the configured JVM setting for the NiFi process in the nifi-bootsrap.conf file. ----- Above may require you to re-evaluate your dataflow design(s) in NiFi to reduce heap memory usage. - Thank you, Matt - If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.
... View more
10-09-2018
01:03 PM
1 Kudo
@Abdou B. - "Stopped" is probably not the correct word to use here. A processor that is started then executes based on the configured "run schedule". When Back pressure is being applied to a processor by one of the processors outgoing connections, the processor will no longer be scheduled to run. It is still started. As soon as back pressure is no longer being applied, the processor will begin executing again based on run schedule. - Thanks, Matt
... View more
10-04-2018
02:03 PM
FlowFile content is not stored in provenance repository. The ability to view or replay content will only work if content still exists in content repository. Content repository can be configured to retain archived content. But keep in mind that the content of active FlowFiles still in dataflows will always take priority over archived content. If active data triggers thresholds for disk usage to exceed configured values, all archived content will be purged. Thanks, Matt - If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.
... View more
10-02-2018
12:36 PM
2 Kudos
@Thomas Lebrun Provenance events are dated. While the provenance repository can be moved from one NiFi to another without issue, simply backing up a portion of it or all of it and trying to merge it with an existing provenance repository later is not possible. - Even trying to take an entire backed up provenance repository and placing it in a clean NiFi later would have its challenges. You would need to make sure the provenance retention settings in whatever NiFi you placed this backed up Provenance repository extended beyond the age of the oldest event in that backed up provenance repository or NiFi would simply purge all the events on startup. - A better option might be to consider building a dataflow on each of your NiFi instances/clusters that uses the SiteToSiteProvenanceReportingTask to send provenance events to another NiFi where it would have a dataflow build to wrote out those events to your choice of long term storage or auditing endpoint of your choice. The provenance events output by this reporting task are just JSON. - https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-site-to-site-reporting-nar/1.7.1/org.apache.nifi.reporting.SiteToSiteProvenanceReportingTask/index.html - Thank you, Matt
... View more
10-01-2018
05:52 PM
@yazeed salem Your NiFi expression Language statement looks good. I even tested base on your example and it routed my flowfiles correctly Make sure that each of the FlowFiles being processed have required FlowFile Attributes set on them. You can stop your RouteOnAttribute processor and allow a few files to queue in the connection feeding it. Then right click on connection and select "List queue". You can then click on "details" icon to far left of any FlowFIle to verify that it does have correct attributes set on it. - - Thank you, Matt - If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.
... View more
09-14-2018
01:47 PM
@sri chaturvedi yes, potentially if there are enough inbound FlowFiles to trigger processor to run 4 times concurrently.
... View more
09-12-2018
12:53 PM
2 Kudos
@sri chaturvedi - You are only going to benefit from setting run duration to 50ms if the processing of each incoming FlowFile to the putSQL processor is taking fractions of the 50ms duration. Details on "Run duration" and how it works can be found here: https://community.hortonworks.com/articles/221807/understanding-nifi-processors-run-duration-functio.html --------- When you set a run duration on a lot of processors, when those threads are executed, they will consume that CPU thread for possibly longer then needed. This means that other processors may end up waiting longer for a thread. - Consider this example: Your putSQL happens to be taking 10 ms to execute the put of a FlowFile. That means that with a 50 ms run duration it would put 5 FlowFiles within the single thread execution. What happens if incoming connection queue only has 1 FlowFile at time of execution? The processor holds that thread for 40 ms longer then needed. That is 40 ms of cpu time not available to another processor. - Since there is some time overhead in staring and stopping threads, run duration is very useful when you have a high sustained dataflow. It can actually decrease performance when used in dataflow where there is not high volume of FlowFiles (High volume is relative to the processors designed task.) - ---------- When it comes to concurrent tasks, this dictates parallel processor execution. - Since your canvas has 2000 processors, you need to understand that all these processor cannot execute at the exact same time. There is only so much CPU available and NiFi has a configureable thread pool size. This means that many processors may be just waiting in line for their chance to to get time on the CPU. - Details on processor Concurrent task setting recommendations can be found here: https://community.hortonworks.com/articles/221808/understanding-nifi-max-thread-pools-and-processor.html - ----------- - You also mentioned NiFi UI slowness. It may be related to nothing above: https://community.hortonworks.com/articles/184786/hdfnifi-improving-the-performance-of-your-ui.html - Thank you, Matt - If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.
... View more
09-12-2018
12:50 PM
4 Kudos
# Max Timer Driven Thread Count and Max Event Driven Thread Count: - Out of the box NiFi sets the Max Timer Thread Counts relatively low to support operating on the simplest of hardware. This default setting can limit the performance of very large high volume dataflow that must perform a lot of concurrent processing. General guidance for setting this value is 2 - 4 times the number of cores available to the hardware on which the NiFi service is running. With a NiFi cluster where each server has different hardware (not recommended), this would be set to the highest value possible based on the server with the fewest cores. NOTE: Remember that all configurations you apply within the NIFi UI are applied to every node in a NiFi cluster. None of the settings apply as a total to the cluster itself. NOTE: The cluster UI can be used to see how the total active threads are being used per node. Closely monitoring system CPU usage over time on each of your cluster nodes will help you identify regular or routine spikes in usage. This information will help you identify if you can increase the “Maximum Timer Driven Thread Count” setting even higher. Just arbitrarily setting this value higher can lead to thread spending excessive time in CPU wait and not really doing any work. This can show as long tasks times reported in processors. - *** The Event Driven scheduling strategy is considered experimental and thus do not recommend that it is used at all. User should only be configuring their NiFi processors to use one of the Timer Driven scheduling strategies (Timer Driven or CRON Driven). - # Assigning Concurrent Tasks to processor components: - Concurrent task settings on processors should always start at default 1 and only be incremented slowly as needed. Assigning too many concurrent tasks to every processor can have an affect on other dataflows/processors. - Because of how the above works, it may appear to a user that they get better performance out of a processor by simply setting a high number of concurrent tasks. What they are really doing is simply stacking up more request in that large queue so the processor gets more opportunity to grab one of the available threads from the resource pool. What often happens is users with a processor only running with 1 concurrent task are affected (simply because of the size of the request queue). So that user increases their concurrent tasks. Before you know it the request queue is so large, no one is benefiting from assigning additional concurrent tasks. - In addition you may have processors that inherently have long running tasks. Assigning these processors lots of concurrent tasks can mean a substantial chunk of that thread pool is being used for an extended amount of time. This then limits the number of available threads from the pool that are trying to work through the remaining tasks in the queue.
... View more
Labels: