Member since
07-30-2019
3406
Posts
1623
Kudos Received
1008
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 320 | 12-17-2025 05:55 AM | |
| 381 | 12-15-2025 01:29 PM | |
| 366 | 12-15-2025 06:50 AM | |
| 358 | 12-05-2025 08:25 AM | |
| 599 | 12-03-2025 10:21 AM |
11-06-2019
05:32 AM
2 Kudos
@girish6 NiFi processor components are configured to execute based on a run schedule. There are two schedule driven strategies available (Cron Driven and Timer Driven). The Cron Driven scheduling strategy uses a user configured Quartz Cron to set how often the processor will execute. The Timer Driven scheduling strategy (most common strategy used) uses a user configured run schedule (default run schedule is 0 secs, which means run as often as system will allow). When a processor executes based on the configured scheduling strategy, it will do one of two things: 1. If the processor has one or more inbound connections, it will check if any of them have any queued FlowFiles. If none of the connections contain any queued FlowFiles, the processor will yield. The yield is intended to keep the processors with run schedule of 0 secs from simply constantly requesting CPU threads to check empty inbound connection queues. No matter the run schedule, a yielded processor will not execute until the yield has expired reducing CPU usage by that processor. 2. Some processor have no inbound connections. These processors will not yield, but continuously execute on the configured run schedule. You would not have any such processors in your PG2 since they will have upstream connections to components in PG1. So for "source" type processors like listSFTP, ListFile, GenerateFlowFIle, or any other processor that does not support an inbound/upstream connection, if the feed of data is not continuous, it is best to use the Cron Driven scheduling strategy or set a Timer Driven run schedule that is not the default 0 secs to reduce CPU usage. On the face of every processor is a state for Tasks/Time. The stat tells you how many threads reported as completed in the past 5 minutes and how much cumulative CPU time was used by all those completed threads. This allows you to see the impact a given processor is having on your CPU. Hope this helps explain cpu usage for you, Matt
... View more
11-05-2019
06:05 PM
@MattWho Follow your points i got win. Thank you a lot. Paul
... View more
11-04-2019
06:00 AM
1 Kudo
@pxm NiFi sets not restriction on the data size that can be processed. Ingested data becomes the content portion of a NiFi FlowFile and is written to the content repository. The data is not read again unless a processor needs to read the content; otherwise, only the FlowFile attributes/metadata is passed from processor component to another component. So you need to make sure you have sufficient storage space for the NIFi content_repository. It is also strongly recommended that this dedicated storage separate from any other NiFi repository. Beyond that, any limitation here will be on network and disk IO. Thanks, Matt
... View more
11-01-2019
05:23 PM
@Matt Thanks, I solved this issue when i follow your point. Paul
... View more
11-01-2019
12:13 AM
@Matt Thank you, I'm doing what you point me to do.
... View more
10-31-2019
12:11 PM
@pauljoshiva What this error is telling is that the flow.xml.gz on this node does not match the flow.xml.gz that is running on another node. On startup of all your NiFi nodes an election takes place to determine which flow.xml.gz will be elected as the cluster flow. So node 1 presents its flow (1 vote), then node 2 presents its flow (if it matches node 1's flow that flows vote increases to 2, if not it gets its own singular vote), this process repeats for all nodes joining the cluster until a flow.xml.gz is elected to be the cluster vote. At that point in time, any nodes that have a flow.xml.gz that matched this elected cluster flow will throw the exception you reported and shut back down. Since all nodes in your cluster must be running the exact same flow.xml.gz, you can copy the flow.xml.gz from one of the nodes that is up and joined in to the cluster to the node that threw the exception and restart it. It should successfully join the cluster at on restart this time. Hope this helps, Matt
... View more
10-31-2019
07:50 AM
Hi @MattWho I understand, but like i said was a fresh instalation and NiFi never works. I try to remove the file from the /run/cloudera-scm-agent/proccess/NIFI_FOLDER but not works and i can't find the file in other location on the node. Maybe was my fault, i try to start directly with TLS and never works, maybe the first start can't create the cert from the NiFi CA (because i saw in the role log a message with missing keystore file) and is for that the NiFi can't start fine and then create the previous sensitive.key file. Finally i remove the service, start un-secure, force to NiFi CA to recreate the certs and then, activate TLS/SSL and is when NiFi starts fine and is working. Now i have another problem, i install NiFi in two workers nodes but only one web ui works, the other web ui shows this message: Secure Connection Failed
An error occurred during a connection to HOSTNAME:8443. Certificate key usage inadequate for attempted operation. Error code: SEC_ERROR_INADEQUATE_KEY_USAGE
The page you are trying to view cannot be shown because the authenticity of the received data could not be verified.
Please contact the website owners to inform them of this problem. Any way @MattWho thanks for your help, i'll keep working on the NiFi web ui issue and try to configure ldap authentication, in this moment i only get the user name validation, i try to integrate the gruop validation. Best Regards.
... View more
10-31-2019
06:57 AM
@Jette There is no need for the updateAttribute processor here, unless there are some missing details to this issue. The extractText processor already permanently adds any created attribute to the FlowFile it outputs. The dynamic property name used becomes the FlowFile attribute name. Matt
... View more
10-30-2019
09:00 AM
hello, I see there is POST option to save a bundle/nar into the NiFi Registry however that can only happen from a local machine. Is it possible to fetch bundle/nar from an artifactory and be able to save it in the NiFi Registry? The reason I am thinking in that direction is because in our current SDLC process, Jenkins will do the build and resolve all dependencies and create the NAR and put it in the Artifactory. How does the bundle gets deployed from the artifactory into the Registry, that is what I am after. Thanks!
... View more
10-30-2019
05:49 AM
2 Kudos
@Elephanta Based on the information provided, here are some things to be aware of/consider: How a Merge based processor decides to Merge a bin: 1. At end of thread execution a bin has reached both minimums (Min size and min records) 2. The max bin age of a bin has been reached 3. at start of a thread execution there are no free bins, forces oldest bin to merge to free a bin. JVM heap memory: 1. While your system has 512GB of memory, how much of that has been allocated to NiFi's JVM. Setting a very large heap for the JVM can result in significant stop-the-world application pauses even when minor Garbage Collection (GC) occurs. Setting JVM heap too low when you have high heap usage processor in use can result in Out Of Memory (OOM) exceptions. 2. Merge based processors have potential for high heap usage. While Merge content does not hold content of all FlowFiles being merged in heap memory, it does hold the FlowFile AttrIbutes of all binned FlowFiles in heap memory. So with a significant number of bins and large min record settings, this can cause high heap usage. This intern can lead to excessive GC occurring. Processor configuration: 1. What is being used as your correlation attribute? Are there more than 64 possible unique correlation attribute values? This could lead to force merging of bins in mergeRecord processor 1-3. 2. With per bin record range set 100,000 - 10,000,000, you run the risk of high heap usage, excessive GC at times, or OOM. Do expect that each unique correlation attribute will have this many records? Perhaps a bin never meets your minimums and merge is only happening because of max bin age. This would explain large pauses and small output FlowFiles. 3. Knowing your incoming data to a merge processor is critical when setting min and max values. Since both mins must be satisfied, you can run in to s scenario where max records is reached, but you did not reach min bin size. That result in bin being forced to sit until max bin age forces it to merge since both min values were not met and because one of the max values was met nothing additional could be allocated to that bin. Again, this can explain your long pauses and small files sizes. 4. you did not mention if your NiFi is a cluster or standalone (single) NiFi instance installation. If a cluster, keep in mind that each node can only merge FlowFIles which exist on that same node. Nodes are not aware of FlowFiles on other nodes. However, since you are merging based on a correlation attribute, you can configure a connection to load-balance data across all your nodes based on that same correlation attribute. This would allow you to use parallel processing to merge your large bundles across multiple NiFi nodes. Threading: 1. When a processor executes, it must requests a thread from the NiFi core. The core has a configurable Max Timer Driven Thread Pool (found in controller setting under the global menu in upper right corner). By default this thread pool is only set to 10. This thread pool is shared by all components you add to your canvas. With 128 cores, the recommended setting for the pool would be 256 - 512 (of course you must also take in to consideration what else may be running in this server, so monitor your cpu usage over time and adjust accordingly.) Disk I/O: 1. NiFi writes all its data in to content claims on disk. We strongly recommend that NiFi's content, flowfile, and provenance repositories are located on separate disks to improve IO and reduce likely hood of corruption of flowfile repo should content repo fill disk to 100%. 2. To help reduce heap usage of actively queued FlowFiles. NiFi will begin writing swap files to disk when a connection queue exceeds the configured swap threshold set in the nifi.properties file. (Note: the connection queue feeding your merge processor may or may not contained swapped FlowFiles. FlowFiles allocated to bins will still show in the connection but will not be eligible to be swapped to disk.) Data ingestion: 1. Your source records seem very small. How is your data being ingested in to NiFi. Perhaps a different method, or processor configuration can yield fewer yet large records. This would result in more efficient merging and less disk swapping. Here are some articles you may want to read: https://community.cloudera.com/t5/Community-Articles/HDF-NIFI-Best-practices-for-setting-up-a-high-performance/ta-p/244999 https://community.cloudera.com/t5/Community-Articles/Dissecting-the-NiFi-quot-connection-quot-Heap-usage-and/ta-p/248166 https://community.cloudera.com/t5/Community-Articles/Understanding-NiFi-max-thread-pools-and-processor-concurrent/ta-p/248920 https://community.cloudera.com/t5/Community-Articles/Understanding-NiFi-processor-s-quot-Run-Duration-quot/ta-p/248921 What you are trying to do is definitely doable with NiFi, but may require some dataflow design and/or system tuning to achieve. Hope this helps, Matt
... View more