About MattWho

MattWho · ‎10-31-2019

@pauljoshiva What this error is telling is that the flow.xml.gz on this node does not match the flow.xml.gz that is running on another node. On startup of all your NiFi nodes an election takes place to determine which flow.xml.gz will be elected as the cluster flow. So node 1 presents its flow (1 vote), then node 2 presents its flow (if it matches node 1's flow that flows vote increases to 2, if not it gets its own singular vote), this process repeats for all nodes joining the cluster until a flow.xml.gz is elected to be the cluster vote. At that point in time, any nodes that have a flow.xml.gz that matched this elected cluster flow will throw the exception you reported and shut back down. Since all nodes in your cluster must be running the exact same flow.xml.gz, you can copy the flow.xml.gz from one of the nodes that is up and joined in to the cluster to the node that threw the exception and restart it. It should successfully join the cluster at on restart this time. Hope this helps, Matt

MattWho · ‎10-31-2019

@Jette There is no need for the updateAttribute processor here, unless there are some missing details to this issue. The extractText processor already permanently adds any created attribute to the FlowFile it outputs. The dynamic property name used becomes the FlowFile attribute name. Matt

MattWho · ‎10-31-2019

@Paul Yang Ranger is not offered in CFM, but will become part of the platform in the future. The only authorization offering within NiFi and NiFi-Registry within CFM is the local file based authorizer. NiFi user and group authorization is controlled via the NiFi UI instead of through an external authorization provider like Ranger. This same local file base authorization was also an option in HDF. https://docs.cloudera.com/cfm/1.0.1/securing-cfm/topics/cfm-enabling-tls.html You can configure NiFi to sync users and groups from LDAP also. You can then through the NiFi UI assign authorization policies to these sync'd user and groups. Thank you, Matt https://docs.cloudera.com/cfm/1.0.1/securing-cfm/topics/cfm-nifi-user-sync-ldap-properties.html

MattWho · ‎10-31-2019

@Sergiete The nifi-app.log should tell you why the second node failed to start if NiFi got beyond the bootstrap process. If not, the nifi bootstrap.log will tell you why it failed to start. Matt

MattWho · ‎10-31-2019

@Sergiete The sensitive.key file is created during NiFi startup and is removed once startup completes successfully. The fact that it still exists when you are trying to start NiFi, tells me that some previous startup attempt failed after sensitive.key was created, but before startup completed. You can safely remove this sensitive.key file from your NiFi nodes and start your NiFi service again. If NiFi fails to start and you see the sensitive.key was created and not removed again, look through your NiFi logs to see why it failed. It will be for a different reason since you had manually removed the sensitive.key before that startup. I have not seen this condition occur on any of my CFM installs yet, but have heard of this happening before. What I do not have is logs to determine what is happening in those cases. Matt

MattWho · ‎10-30-2019

@Elephanta Based on the information provided, here are some things to be aware of/consider: How a Merge based processor decides to Merge a bin: 1. At end of thread execution a bin has reached both minimums (Min size and min records) 2. The max bin age of a bin has been reached 3. at start of a thread execution there are no free bins, forces oldest bin to merge to free a bin. JVM heap memory: 1. While your system has 512GB of memory, how much of that has been allocated to NiFi's JVM. Setting a very large heap for the JVM can result in significant stop-the-world application pauses even when minor Garbage Collection (GC) occurs. Setting JVM heap too low when you have high heap usage processor in use can result in Out Of Memory (OOM) exceptions. 2. Merge based processors have potential for high heap usage. While Merge content does not hold content of all FlowFiles being merged in heap memory, it does hold the FlowFile AttrIbutes of all binned FlowFiles in heap memory. So with a significant number of bins and large min record settings, this can cause high heap usage. This intern can lead to excessive GC occurring. Processor configuration: 1. What is being used as your correlation attribute? Are there more than 64 possible unique correlation attribute values? This could lead to force merging of bins in mergeRecord processor 1-3. 2. With per bin record range set 100,000 - 10,000,000, you run the risk of high heap usage, excessive GC at times, or OOM. Do expect that each unique correlation attribute will have this many records? Perhaps a bin never meets your minimums and merge is only happening because of max bin age. This would explain large pauses and small output FlowFiles. 3. Knowing your incoming data to a merge processor is critical when setting min and max values. Since both mins must be satisfied, you can run in to s scenario where max records is reached, but you did not reach min bin size. That result in bin being forced to sit until max bin age forces it to merge since both min values were not met and because one of the max values was met nothing additional could be allocated to that bin. Again, this can explain your long pauses and small files sizes. 4. you did not mention if your NiFi is a cluster or standalone (single) NiFi instance installation. If a cluster, keep in mind that each node can only merge FlowFIles which exist on that same node. Nodes are not aware of FlowFiles on other nodes. However, since you are merging based on a correlation attribute, you can configure a connection to load-balance data across all your nodes based on that same correlation attribute. This would allow you to use parallel processing to merge your large bundles across multiple NiFi nodes. Threading: 1. When a processor executes, it must requests a thread from the NiFi core. The core has a configurable Max Timer Driven Thread Pool (found in controller setting under the global menu in upper right corner). By default this thread pool is only set to 10. This thread pool is shared by all components you add to your canvas. With 128 cores, the recommended setting for the pool would be 256 - 512 (of course you must also take in to consideration what else may be running in this server, so monitor your cpu usage over time and adjust accordingly.) Disk I/O: 1. NiFi writes all its data in to content claims on disk. We strongly recommend that NiFi's content, flowfile, and provenance repositories are located on separate disks to improve IO and reduce likely hood of corruption of flowfile repo should content repo fill disk to 100%. 2. To help reduce heap usage of actively queued FlowFiles. NiFi will begin writing swap files to disk when a connection queue exceeds the configured swap threshold set in the nifi.properties file. (Note: the connection queue feeding your merge processor may or may not contained swapped FlowFiles. FlowFiles allocated to bins will still show in the connection but will not be eligible to be swapped to disk.) Data ingestion: 1. Your source records seem very small. How is your data being ingested in to NiFi. Perhaps a different method, or processor configuration can yield fewer yet large records. This would result in more efficient merging and less disk swapping. Here are some articles you may want to read: https://community.cloudera.com/t5/Community-Articles/HDF-NIFI-Best-practices-for-setting-up-a-high-performance/ta-p/244999 https://community.cloudera.com/t5/Community-Articles/Dissecting-the-NiFi-quot-connection-quot-Heap-usage-and/ta-p/248166 https://community.cloudera.com/t5/Community-Articles/Understanding-NiFi-max-thread-pools-and-processor-concurrent/ta-p/248920 https://community.cloudera.com/t5/Community-Articles/Understanding-NiFi-processor-s-quot-Run-Duration-quot/ta-p/248921 What you are trying to do is definitely doable with NiFi, but may require some dataflow design and/or system tuning to achieve. Hope this helps, Matt

MattWho · ‎10-30-2019

@Aban The response is telling you that the data passed to the endpoint is not "Content-Type: text/plain". By default curl will use "text/plain" if you do not specify a different type via a header. try adding the below to your curl command to set your content type to "application/json" -H "Content-Type: application/json" Thanks, Matt

MattWho · ‎10-29-2019

@big-f The work to support bundles is part of NiFi-Registry 0.4.0 https://nifi.apache.org/docs/nifi-registry-docs/html/administration-guide.html#bundle-persistence-providers https://nifi.apache.org/docs/nifi-registry-docs/html/user-guide.html#manage-bundles https://issues.apache.org/jira/browse/NIFIREG-211

MattWho · ‎10-29-2019

@big-f NiFi-Registry storage of version controlled flows is only the beginning use case for NiFi-registry. 1. Additional features of NiFi-Registry in the pipe line include: - Storing of FlowFile parameters (NiFi parameters can be used by any property in any component in NiFi.) - Storing of NiFi nars (NiFi has grown to be a very large application. Rather then installing every single component that exists, users can set up local registries that store nars. Then these nars can be pulled and dynamically loaded during runtime in NiFi as needed by the flow which has been built.) 2. The only method of deploying flows other than through NiFi-Registry is the legacy templates. but even here you have an existing NiFi canvas (blank or with other existing flows). You then import and instantiate templates on to that existing canvas somewhere, Downsides to this legacy method include no version control and templates are stored in heap memory. Even if a template is not instantiated to the canvas, it is held in heap memory. More templates equals more heap usage. 3. I think this question is answered by point number two in answer 1. A NiFi FlowFile is what moves from processor to processor via a connection. Not sure what you are referring to when you say "flow file"? Are you talking about the flow.xml.gz?

MattWho · ‎10-28-2019

@DavidR Typically you would only enable GC debug logging to investigate and issue and not leave these properties in all the time. If you intend on leaving these in all the time, i suggest removing the following line: java.arg.23=-Xloggc:<file> With no log file defined the GC logs will go directly in to the nifi-bootstrap.log. You can then easily control/configure log rotation and retention for the NiFi bootstrap log in the logback.xml file found in the NiFi conf directory. Thanks, Matt

Online	Offline
Last Visited	‎12-24-2025 05:28 AM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎12-24-2025 05:28 AM
Posts	3,406
Kudos received	1618

Cloudera Community

Re: Error importing NiFi workflow template from ve...

Re: Error importing NiFi workflow template from ve...

Re: How to elevate a default nifi user to admin - ...

Re: NiFi EnvokeHTTP - putting current date on HTTP...

Re: Invoking Nifi rest api in Data Flow

Re: Accidentally moved Nifi Flow file in backup an...

Re: Extract text from text message and update the ...

Re: How to plan the Multi-Tenant Authorization on ...

Re: CFM - NiFi is not starting due to "ERROR o.a.n...

Re: CFM - NiFi is not starting due to "ERROR o.a.n...

Re: NiFi MergeRecord processor behaving in a stran...

Re: CLI Nifi processor execution

Re: NiFi integration with git

Re: NiFi integration with git

Re: NIFI: How to turn on GC logging for NiFi?