Member since
07-30-2019
3398
Posts
1621
Kudos Received
1001
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 485 | 11-05-2025 11:01 AM | |
| 374 | 11-05-2025 08:01 AM | |
| 596 | 11-04-2025 10:16 AM | |
| 736 | 10-20-2025 06:29 AM | |
| 876 | 10-10-2025 08:03 AM |
10-31-2019
07:50 AM
Hi @MattWho I understand, but like i said was a fresh instalation and NiFi never works. I try to remove the file from the /run/cloudera-scm-agent/proccess/NIFI_FOLDER but not works and i can't find the file in other location on the node. Maybe was my fault, i try to start directly with TLS and never works, maybe the first start can't create the cert from the NiFi CA (because i saw in the role log a message with missing keystore file) and is for that the NiFi can't start fine and then create the previous sensitive.key file. Finally i remove the service, start un-secure, force to NiFi CA to recreate the certs and then, activate TLS/SSL and is when NiFi starts fine and is working. Now i have another problem, i install NiFi in two workers nodes but only one web ui works, the other web ui shows this message: Secure Connection Failed
An error occurred during a connection to HOSTNAME:8443. Certificate key usage inadequate for attempted operation. Error code: SEC_ERROR_INADEQUATE_KEY_USAGE
The page you are trying to view cannot be shown because the authenticity of the received data could not be verified.
Please contact the website owners to inform them of this problem. Any way @MattWho thanks for your help, i'll keep working on the NiFi web ui issue and try to configure ldap authentication, in this moment i only get the user name validation, i try to integrate the gruop validation. Best Regards.
... View more
10-31-2019
06:57 AM
@Jette There is no need for the updateAttribute processor here, unless there are some missing details to this issue. The extractText processor already permanently adds any created attribute to the FlowFile it outputs. The dynamic property name used becomes the FlowFile attribute name. Matt
... View more
10-30-2019
09:00 AM
hello, I see there is POST option to save a bundle/nar into the NiFi Registry however that can only happen from a local machine. Is it possible to fetch bundle/nar from an artifactory and be able to save it in the NiFi Registry? The reason I am thinking in that direction is because in our current SDLC process, Jenkins will do the build and resolve all dependencies and create the NAR and put it in the Artifactory. How does the bundle gets deployed from the artifactory into the Registry, that is what I am after. Thanks!
... View more
10-30-2019
05:49 AM
2 Kudos
@Elephanta Based on the information provided, here are some things to be aware of/consider: How a Merge based processor decides to Merge a bin: 1. At end of thread execution a bin has reached both minimums (Min size and min records) 2. The max bin age of a bin has been reached 3. at start of a thread execution there are no free bins, forces oldest bin to merge to free a bin. JVM heap memory: 1. While your system has 512GB of memory, how much of that has been allocated to NiFi's JVM. Setting a very large heap for the JVM can result in significant stop-the-world application pauses even when minor Garbage Collection (GC) occurs. Setting JVM heap too low when you have high heap usage processor in use can result in Out Of Memory (OOM) exceptions. 2. Merge based processors have potential for high heap usage. While Merge content does not hold content of all FlowFiles being merged in heap memory, it does hold the FlowFile AttrIbutes of all binned FlowFiles in heap memory. So with a significant number of bins and large min record settings, this can cause high heap usage. This intern can lead to excessive GC occurring. Processor configuration: 1. What is being used as your correlation attribute? Are there more than 64 possible unique correlation attribute values? This could lead to force merging of bins in mergeRecord processor 1-3. 2. With per bin record range set 100,000 - 10,000,000, you run the risk of high heap usage, excessive GC at times, or OOM. Do expect that each unique correlation attribute will have this many records? Perhaps a bin never meets your minimums and merge is only happening because of max bin age. This would explain large pauses and small output FlowFiles. 3. Knowing your incoming data to a merge processor is critical when setting min and max values. Since both mins must be satisfied, you can run in to s scenario where max records is reached, but you did not reach min bin size. That result in bin being forced to sit until max bin age forces it to merge since both min values were not met and because one of the max values was met nothing additional could be allocated to that bin. Again, this can explain your long pauses and small files sizes. 4. you did not mention if your NiFi is a cluster or standalone (single) NiFi instance installation. If a cluster, keep in mind that each node can only merge FlowFIles which exist on that same node. Nodes are not aware of FlowFiles on other nodes. However, since you are merging based on a correlation attribute, you can configure a connection to load-balance data across all your nodes based on that same correlation attribute. This would allow you to use parallel processing to merge your large bundles across multiple NiFi nodes. Threading: 1. When a processor executes, it must requests a thread from the NiFi core. The core has a configurable Max Timer Driven Thread Pool (found in controller setting under the global menu in upper right corner). By default this thread pool is only set to 10. This thread pool is shared by all components you add to your canvas. With 128 cores, the recommended setting for the pool would be 256 - 512 (of course you must also take in to consideration what else may be running in this server, so monitor your cpu usage over time and adjust accordingly.) Disk I/O: 1. NiFi writes all its data in to content claims on disk. We strongly recommend that NiFi's content, flowfile, and provenance repositories are located on separate disks to improve IO and reduce likely hood of corruption of flowfile repo should content repo fill disk to 100%. 2. To help reduce heap usage of actively queued FlowFiles. NiFi will begin writing swap files to disk when a connection queue exceeds the configured swap threshold set in the nifi.properties file. (Note: the connection queue feeding your merge processor may or may not contained swapped FlowFiles. FlowFiles allocated to bins will still show in the connection but will not be eligible to be swapped to disk.) Data ingestion: 1. Your source records seem very small. How is your data being ingested in to NiFi. Perhaps a different method, or processor configuration can yield fewer yet large records. This would result in more efficient merging and less disk swapping. Here are some articles you may want to read: https://community.cloudera.com/t5/Community-Articles/HDF-NIFI-Best-practices-for-setting-up-a-high-performance/ta-p/244999 https://community.cloudera.com/t5/Community-Articles/Dissecting-the-NiFi-quot-connection-quot-Heap-usage-and/ta-p/248166 https://community.cloudera.com/t5/Community-Articles/Understanding-NiFi-max-thread-pools-and-processor-concurrent/ta-p/248920 https://community.cloudera.com/t5/Community-Articles/Understanding-NiFi-processor-s-quot-Run-Duration-quot/ta-p/248921 What you are trying to do is definitely doable with NiFi, but may require some dataflow design and/or system tuning to achieve. Hope this helps, Matt
... View more
10-30-2019
04:47 AM
@Aban The response is telling you that the data passed to the endpoint is not "Content-Type: text/plain". By default curl will use "text/plain" if you do not specify a different type via a header. try adding the below to your curl command to set your content type to "application/json" -H "Content-Type: application/json" Thanks, Matt
... View more
10-29-2019
04:21 AM
@MattWho Thank you for your input. It works and It was invaluable.
... View more
10-28-2019
02:06 AM
Thankyou MattWho I added Executestreamcommand processer before fetching data.
... View more
10-22-2019
01:19 PM
@jspuri I suggest making the following configuration changes: 1. The zoo.cfg files on both your ZK nodes should be the same tickTime=2000
initLimit=5
syncLimit=2
dataDir=/home/ec2-user/zookeeper
clientPort=2181
server.1=ec2-server-1.compute-1.amazonaws.com:2888:3888
server.2=ec2-server-2.compute-1.amazonaws.com:2888:3888 Note: For proper Quorum zookeeper cluster should also have an odd number of servers (3 or 5) 2. In NiFi state-management.xml <property name="Connect String">ec2-server-1.compute-1.amazonaws.com:2888:3888,ec2-server-2.compute-1.amazonaws.com:2888:3888</property> 3. in nifi.properties file nifi.zookeeper.connect.string=ec2-server-1.compute-1.amazonaws.com:2888:3888,ec2-server-2.compute-1.amazonaws.com:2888:3888 Hope this helps, Matt
... View more
10-22-2019
05:57 AM
@Kit2020 You will want to use the MergeContent or MergeRecord processor to merge you incoming data from Kafka. NiFi FlowFiles consists of two parts: 1. FlowFile attributes/metadata --> This information is held in heap memory 2. FlowFile content . --> held in NiFi's content repository on disk. The merge based processor above will hold the FlowFile attributes of every FlowFile being merged (allocated to a merge bin) in heap memory. Considering you are talking about merging 10 million FlowFiles, you will not be able to merge all of these using a single merge processor without likely encountering an out of memory condition. A better approach is to have two merge processor in series with the first merging batches of min 10000 to 20000 and the second merging those into another batch resulting in the merging of all FlowFiles. A merge processor bin is eligible to be merged when the minimum set values are met. Meaning if you set min entires/records to 1 and max to 10000, the bin can merge with only one FlowFile. At time of execution the thread grabs what is in inbound connection at that moment in time and allocates it to a bin. Then checks if that bin met mins and if so merges it. So makes sure your minimums on your multiple merge processors are set high enough (for example 10000 on first and 1000 on merge processor number 2, result 10,000,000 merged FlowFiles) Now if you do not know the exact number of records you need to merge, set the second merge processor mins to a higher value than you expect to receive. Then set your "Max bin age" to a value you can accept for data latency. The max bin age is your force merge setting. So even if min vales are not reached, a bin that has existed for this length of time will be forced to merge. Setting bin age on both processor is important. With high value on second then first. This allows time for a bin from merge one (typically last created) that may not meet 10000 min to get merged while second is still waiting. Hope this helps, Matt
... View more
10-21-2019
04:22 AM
Hi, Did adding Nifi hostnames to the load balancer certificate's SAN help?
... View more