Member since
07-30-2019
3400
Posts
1621
Kudos Received
1003
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 96 | 12-05-2025 08:25 AM | |
| 231 | 12-03-2025 10:21 AM | |
| 522 | 11-05-2025 11:01 AM | |
| 395 | 11-05-2025 08:01 AM | |
| 690 | 11-04-2025 10:16 AM |
10-02-2017
05:43 PM
@Andre Labbe 1. NiFi is designed to prevent changes while a node is disconnected. Each node runs its own copy of the flow.xml.gz. When a node is disconnected, the cluster coordinator has not means to determine the state of components on that disconnected node. If changes were allowed to be made, that node would be unable to automatically rejoin the cluster without manual intervention to bring them back in sync. To regain control of the canvas you can access the cluster UI and drop the disconnected node from the cluster. Keep in mind that if you want to later rejoin that node, you will need to make sure the flow.xml.gz, users.xml (if used when NiFi is secured), and authorizations.xml (if used when NiFi is secured) are all in-sync with what is currently be used in cluster (you can copy these from other active cluster node). *** Be mindful that if a change you made removed a connection that contained data on your disconnected node, that data will be lost on startup once you replace the flow.xml.gz. If Nifi cannot find the connection to place the queued data it is lost. 2. If you set a Processor component to execute on "Primary node", it dose not run. What component and how do you know it did not run? If have not see this happen before. Processors that use no cluster friendly protocols should be run on Primary Node only to prevent data duplication as you noted above. If you are consuming a lot of data using one of these protocols, ti is suggested you use the List/Fetch (example: listSFTP/FetchSFTP) processors along with NiFi's Site-To-Site (S2S) capability to redistribute the listed FlowFiles to all nodes in your cluster before the Fetch. Also note that you need an odd number of Zookeeper nodes in order to have Quorum. (3 minimum). Using the embedded zookeeper means you lose that ZK node anytime you restart that NiFi instance. I don't recommend using embedded ZK in production. Thanks, Matt
... View more
09-27-2017
06:47 PM
@pawan soni
Did you resolve your invalidate state by starting your "From File" input port? Your screenshot shows the RPG as "Enable transmission" and the input port as "stopped". Thanks, Matt
... View more
09-27-2017
06:26 PM
@Obaid Salikeen There is no direct correlation between CPU and heap memory usage. Heap usage is more processor and flow implementation specific. Processors that do things like splitting or merging of FlowFiles can end up using more heap. FlowFile Attributes live in heap memory. NiFi does swap FlowFile attribute to disk per connection based on FlowFile queue count. Default of 20,000 will trigger swapping to start on a connection. But there is no sap threshold based on FlowFile attribute map size. If a user creates large attribute values to FlowFile Attributes, that FlowFile heap usage is going to be higher. You see this isn scenarios where large parts of the FlowFile content is extracted to a FlowFile attribute. So when it comes to heap/memory usage, it comes down to flow design more then any correlation to the number of CPUs. Thanks, Matt
... View more
09-27-2017
01:10 PM
@pawan soni Is the UI of the NiFi instance running on Node 1 reachable via port 9090? The RPG reports some communication issues there. This may have just been the result of node 1 restart? The invalid state message indicates the state of your Remote Process Group is Enabled ( ); however, the Remote Input Port is stopped (thus an invalid state for data transfer). Either start the "From File" input port or "disable that port in your RPG to get rid of this ERROR. Thanks, Matt
... View more
09-26-2017
12:55 PM
@Alvin Jin When you obtain a token, that token is only valid against the specific node that it was issued from. So if you use token=$(curl -k -X POST --negotiate -u : https://<nifi-node1>:9091/nifi-api/access/kerberos) Then that token can only be used to access NiFi end-points on nifi-node1 only. You would need to obtain a different token for node2, node3, etc... Also keep in mind that NIFI will only continue to accept a token for the configured expiration time. Default is 12 hours as you see in the kerberos-provider configuration. After expiration, a new token will be needed. Thanks, Matt
... View more
09-25-2017
11:36 AM
@James V Can you post teh entire verbose output of both your Keystore and Truststore?
... View more
09-22-2017
03:02 PM
@Saikrishna Tarapareddy NiFi's authentication and authorization controls what users can access NiFI's various features and components. All the NiFi components added to the canvas of a Nifi instance are executed by the user who owns the NiFi service and not the user who is currently logged in. So you need to make sure the target directory your PutFile processor is writing to has teh necessary permission set on it to allow the NiFi service/process user to write to it. You can see what user owns the nifi process by running the following command (assumming you are running on Linux OS) ps -ef|grep nifi Thanks, Matt
... View more
09-21-2017
05:00 PM
@James V The "Keystore" you are using that you are using that was derived form your CA should contain only a single "PrivateKeyEntry". That "PrivateKeyEntry" should have a EKU that authorizes it use for both clientAuth and ServerAuth. (Based on above, EKU looks correct.) The Issuer listed of that PrivateKeyEntry should be the DN for your CA. If the Issuer is the same as the owner, it is a self signed cert. This typically means you did not install the response you got back from your CA. You should have provided your CA with a csr (certificate signing request) which you then received a response for. The "truststore" should not contain any PrivateKeyEntries. It should contain 1 to many "TrustedCertEntries". There should be a trustedCertEntry for every CA that signs any certificates being used anywhere to communicate with this NiFi. TrustedCertEntries are nothing more teh public keys. Thanks, Matt
... View more
09-19-2017
02:09 PM
@sally sally By setting your minimums (Min Num Entries and Min Group Size to some large value), FlowFiles that are added to a bin will not qualify for merging right away. You should then set "Max Bin Age" to a unit of time you are willing to allow a bin to hang around before it is merged regardless of the number of entries in that bin or that bins size. As far as the number of bins go, a new bin will be created for each unique filename found in the incoming queue. Should the MergeContent processor encounter more unique filenames then there are bins, the MergeContent processor will force merging of the oldest bin to free a bin for the new filename. So it is important to have enough bins to accommodate the number of unique filenames you expect to pass through this processor during the configured "max bin age" duration; otherwise, you could still end up with 1 FlowFile per merge. Thanks, Matt
... View more
09-19-2017
01:01 PM
1 Kudo
@David Miller NiFi's default File based authorizer: Advantages: - supports user groups (This can make setting up authorizations for team a lot less cumbersome.) - Integrated within NiFi so no need to worry about connective issues with external service. Disadvantages: - There is no way currently to sync the user with LDAP. User must be added manually. Ranger based authorizer: Advantages: - Ranger can be setup to sync users from LDAP. - Authorizing new users does not require authorization admin to have access to NiFi's UI. Disadvantages: - Ranger user groups are not supported yet. (Each and every user must be added to any policy required) Here is a helpful link that maps Ranger Policies to NiFi's Default user authorizations: https://community.hortonworks.com/content/kbentry/115770/nifi-ranger-based-policy-descriptions.html ----- You are correct that the most common approach to user/team managed authorization is through the user of unique process groups added to the root canvas level. Sub-process groups by default inherit their access policies from the parent process group. The only thing to be aware of is the use of NiFi's Site-To-Site (S2S)capability. Site-To-Site Remote input and output ports must be added at the root canvas level. So when it comes to using S2S to receive or send data from a NiFi, you would need a admin level user who has the ability to add these components to to the root canvas level for your users and connect them to a process group(s) that your users/teams are authorized for. The other side of a S2S connection is a Remote Process Group (RPG). These RPGs can be added at any level (sub-process group) in a dataflow so special considerations are needed here. A typical approach might be to create a Remote Input port for each team (Process group) and connect that port(s) to their assigned process group. Once in the team group, a routing processor could be shared by all sub teams/users so direct a particular feed of incoming S2S data to a particular sub-process group. Teams are still going to need to work with admins to authorize remote NiFi instance to connect to these ports, so it cannot be completely team managed after creation. Thanks, Matt
... View more