Member since
07-30-2019
3464
Posts
1641
Kudos Received
1015
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 392 | 03-23-2026 05:44 AM | |
| 308 | 02-18-2026 09:59 AM | |
| 555 | 01-27-2026 12:46 PM | |
| 968 | 01-20-2026 05:42 AM | |
| 1286 | 01-13-2026 11:14 AM |
02-23-2017
07:06 PM
5 Kudos
There is a two part process before any access to NiFi UI is possible: 1. Authentication: By default NiFi will use a user/server's SSL certificate when provided in the connection to authenticate. When NO user/server certificate is presented, NiFi will then look for a Kerberos TGT (If Spnego has been configured in NiFi). Finally, if neither of the above where present in the connection, NiFi will use the login identity provider (if configured). Login identity providers include either ldap or kerberos. With both of these options, NiFi will present users with a login screen. 2. Authorization: Authorization is the mechanism that controls what features and components authenticated users are granted access. The default authorizer NiFi will use is the internal file based authorizer. There is an option to configure NiFi to use Ranger as the authorizer instead. The intent of this article is not to discuss how to setup NiFi to use any of the Authentication or Authorizer options. This article covers how to modify what identity is passed two the Authorizer after any one of the authentication mechanism is successful. What is actually passed to the authorizer varies depending on which Authentication method is in use. SSL certificates: Default, always enabled, and always checked first NiFi uses the full DN from the certificate. Spnego (kerberos): Always on when enabled and only used if a SSL Certificate was not present in connection. NiFi uses the full user principal. ldap-provider (option in login-identity-providers): Always on once configured and only used if both SSL certificate and TGT (if Spnego was enabled) are not present in connection. Default configuration of ldap-provider will use the full DN returned by LDAP upon successful authentication. (USE_DN Identity Strategy) Can be configured to pass the username used to login instead. (USE_USERNAME Identity Strategy) Kerberos-provider (option in login-identity-providers): Always on once configured and only used if both SSL certificate and TGT (if Spnego was enabled) are not present in connection. The kerberos-provider will use the use the user full principal upon successful authentication. (USE_DN Identity Strategy) Whether you choose to use the built in file based authorizer or optional configure you NiFi to use Ranger instead, users must be added and granted various access policies. Adding users using either full a DN or users principal can be both annoying and prone to errors since the authorizer is case sensitive and white spaces are valid characters. This is where NiFi's identity mapping optional configurations come in to play. Identity mapping takes place after successful authentication and before authorization occurs. It gives you the ability to take the returned value from all four of the authentication methods and pass them through 1 or more mappings to produce a simple resulting value which is then passed to your authorizer. The identity mapping properties are configured in NiFi's nifi.properties file and consist of two parts to each mapping you define: nifi.security.identity.mapping.pattern.<user defined>=
nifi.security.identity.mapping.value.<user defined>= The mapping pattern takes a java regular expression as input with the expectation that one of more capture groups are defined in that expression. One or more of those capture groups are then used in the mapping value to create the desired final result that will be passed to your configured authorizer. **** Important note: If you are implementing pattern mapping on a existing NiFi cluster that is already running securely, the newly added mappings will be run against the DNs from the certificates created for your nodes and the Initial Admin Identity value you originally configured. If any of your mapping match, a new value is going to passed to your authorizer which means you may lose access to your UI. Before adding any mapping make sure you have added the new mapped value users to your NiFi and authorized them so you do not lose access. By default NiFi includes 2 example identity mappings commented out in the NiFi properties file: You can add as many Identity mapping pattern and value as you like to accommodate all your various user/server authentication types. Each must have a unique identifier. In the above examples the unique identifiers are "dn" and "kerb". You could add for example "nifi.security.identity.mapping.pattern.dn2=" and "nifi.security.identity.mapping.value.dn2=" If you are using Ambari to install and manage your NiFi cluster (HDF 2.x version), you can find the 2 sample identity mapping properties under "Advanced nifi-properties": If you want add additional mappings beyond the above 2 via ambari, these would be added via the "Custom nifi-properties" config section. Simply click the "Add Property..." link to add your new mappings. The result of any successful authentication is run through all configured identity mapping until a match is found. If no match is found the full DN or user principal is passed to the authorizer. Let's take a look at a few examples: User/server DN or Principal Identity Mapping Pattern Identity Mapping Value Result passed to authorizer CN=nifi-server-01.openstacklocal, OU=NIFI ^CN=(.*?), OU=(.*?)$ $1 nifi-server-01 CN=nifi-01, OU=SME, O=mycp, L=Fulton, ST=MD, C=US ^CN=(.*?), OU=(.*?), O=(.*?), L=(.*?), ST=(.*?), C=(.*?)$ $1@$2 nifi-01@SME nifi/instance@MY.COMPANY.COM ^(.*?)/instance@(.*?)$ $1@$2 nifi@MY.COMPANY.COM cn=nifi-user1,ou=SME,dc=mycp,dc=com ^cn=(.*?),ou=(.*?),dc=(.*?),dc=(.*?)$ $1 nifi-user1 JohnDoe@MY.COMPANY.COM ^(.*?)@(.*?)$ $1 JohnDoe ^EMAILADDRESS=none@none.com, CN=nifi-user2, OU=SME, O=mycp, L=Fulton, ST=MD, C=US ^EMAILADDRESS=(.*?), CN=(.*?), OU=(.*?), O=(.*?), L=(.*?), ST=(.*?), C=(.*?)$ $2 nifi-user2 As you can see from the above examples, using NiFi's pattern mapping ability with simplify authorizing new users via either NiFi's default file based authorizer or using Ranger.
... View more
Labels:
02-23-2017
02:25 PM
9 Kudos
NiFi works with FlowFiles. Every FlowFile that exists consists of two parts, FlowFile content and FlowFile Attributes. While the FlowFile's content lives on disk in the content repository, NiFi holds the "majority" of the FlowFile attribute data in the configured JVM heap memory space. I say "majority" because NiFi does swapping of Attributes to disk on any queue that contains over 20,000 FlowFiles (default, but can be changed in the nifi.properties). Once your NiFi is reporting OutOfMemory (OOM) Errors, there is no corrective action other then restarting NiFi. If changes are not made to your NiFi or dataflow, you are surely going to encounter this issue again and again. The default configuration for JVM heap in NiFi is only 512 MB. This value is set in the nifi-bootstrap.conf file. # JVM memory settings
java.arg.2=-Xms512m
java.arg.3=-Xmx512m While the default may work for some dataflow, they are going to be undersized for others.
Simply increasing these values till you stop seeing (OOM) error should not be your immediate go to solution. Very large heap sizes could also have adverse impacts on your dataflow as well. Garbage collection will take much longer to run with very large heap sizes. While garbage collections occurs, it is essentially a stop the world event. This amount to dataflow stoppage for the length time it takes for that to complete. I am not saying that you should never set large heap sizes because sometimes that is really necessary; however, you should evaluate all other options first.... NiFi and FlowFile attribute swapping: NiFi already has a built in mechanism to help reduce the overall heap footprint. The mechanism swaps FlowFiles attributes to disk when a given connection's queue exceeds the configured threshold. These setting are found in the nifi.properties file: nifi.swap.manager.implementation=org.apache.nifi.controller.FileSystemSwapManager
nifi.queue.swap.threshold=20000
nifi.swap.in.period=5 sec
nifi.swap.in.threads=1
nifi.swap.out.period=5 sec
nifi.swap.out.threads=4 Swapping however will not help if your dataflow is so large that queues are how everywhere, but still have not exceeded the threshold for swapping. Anytime you decrease the swap threshold, more swapping can occur which may result in some throughput performance. So here are some other things to check for... So some common reason for running out of heap memory include: 1. High volume dataflow with lots of FlowFiles active any any given time across your dataflow. (Increase configured nifi heap size in bootstrap.conf to resolve)
2. Creating a large number of Attributes on every FlowFile. More Attributes equals more heap usage per FlowFile. Avoid creating unused/unnecessary Attributes on FlowFiles. (Increase configured nifi heap size in bootstrap.conf to resolve and/or reduce the configured swap threshold) 3. Writing large values to FlowFile Attributes. Extracting large amounts of content and writing it to an attribute on a FlowFile will result in high heap usage. Try to avoid creating large attributes when possible. (Increase configured nifi heap size in bootstrap.conf to resolve and/or reduce the configured swap threshold) 4. Using the MergeContent processor to merge a very large number of FlowFiles. NiFi can not merge FlowFiles that are swapped, so all these FlowFile's attributes must be in heap when the merge occurs. If merging a very large number of FlowFiles is needed, try using two MergeContent processors in series with one another. Have first merge a max of 20,000 FlowFiles and the second then merge those 10,000 FlowFile files in to even larger bundles. (Increase configured nifi heap size in bootstrap.conf also help) 5. Using the SplitText processor to split one File in to a very large number of FlowFiles. Swapping of a large connection queue will not occur until after the queue has exceeded swapping threshold. The SplitTEXT processor will create all the split FiLowFiles before committing them to the success relationship. Most commonly seen when SpitText is used to split a large incoming FlowFile by every line. It is possible to run out of heap memory before all the splits can be created. Try using two SplitText processors in series. Have the first split the incoming FlowFiles in to large chunks and the second split them down even further. (Increase configured nifi heap size in bootstrap.conf also help) Note: There are additional processors that can be used for splitting and joining large numbers of FlowFiles, so the same approach as above should be followed for those as well. I only specifically commented on the above since they are more commonly seen being used to deal with very large numbers of FlowFiles.
... View more
Labels:
02-08-2017
09:55 PM
19 Kudos
What is Content Repository Archiving? There are three properties in the nifi.properties file that deal with the archiving on content in the NiFi Content Repository. The default NiFi values for these are shown below: nifi.content.repository.archive.max.retention.period=12 hours
nifi.content.repository.archive.max.usage.percentage=50%
nifi.content.repository.archive.enabled=true The purpose of content archiving is so that users can view and/ or replay content via the provenance UI that is no longer in their dataflow(s). The configured values do not have any impact on the amount of provenance history that is retained. If content associated to a particular provenance event no longer exists in the content archive, provenance will simply report to the user that the content is not available. The content archive is kept in within the same directory or directories where you have configured your content repository(s) to exist. When a "content claim" is archived, that claim is moved in to an archive subdirectory within the same disk partition where it originally existed. This keeps archiving from affecting NiFi's content repository performance with unnecessary writes that would be associated with moving archived Files to a new disk/partition for example. The configured max retention period tells NiFi how long to keep a archived "content claim" before purging it from the content archive directory. The configured max usage percentage tells NiFi at what point it should start purging archived content claims to keep the overall disk usage at or below the configured percentage. This is a soft limit. Let's say the content repository is at 49% usage. A 4GB content claim then becomes eligible for archiving. Once at time this content claim is archived the usage may exceed the configured 50% threshold. At the next checkpoint, NiFi will remove the oldest archived content claim(s) to bring the overall disk usage back or below 50%. So this value should never be set to 100%. The above two properties are enforced using an or policy. Whichever max occurs first will trigger the purging of archived content claims. Let's look at a couple examples: Example 1: Here you can see that are Content Repository has 35% of its disk consumed by Content Claims that are tied to FlowFiles still active somewhere in one or more dataflows on the NiFi canvas. This leaves 15% of the disk space to be used for archived content claims. Example 2: Here you can see that the amount of Content Claims still active somewhere within your NiFi flow has exceeded 50% disk usage in the content repository. As such you can see there are no archived content claims. The content repository archive setting have no bearing on how much of the content repository disk will be used by active FlowFiles in your dataflow(s). As such, it is possible for your content repository to still fill to 100% disk usage. *** This is the exact reason why as a best practice you should avoid co-locating your content repository with any of the other Nifi repositories. It should be isolated to a disk(s) that will not affect other applications or the OS should it fill to 100%. What is a Content Claim? I have mentioned "Content Claim" throughout this article. Understanding what a content claim will help you understand your disk usage. NiFi stores content in the content repository inside claims. A single claim can contain the content from 1 to many FlowFiles. The property that governs how a content claim is built are is found in the nifi.properties file. The default configuration value is shown below: nifi.content.claim.max.appendable.size=50 KB The purpose of content claims is to make the most efficient use of disk storage. This is especially true when dealing with many very small files. The configured max appendable size tells NiFi at what point should NiFi stop appending additional content to an existing content claim before starting a new claim. It does not mean all content ingested by NiFi must be smaller than 50 KB. It also does not mean that every content claim will be at least 50 KB in size. Example 1: Here you can see we have a single content claim that contains both large and small pieces of content. The overall size has exceeded the 10 MB max appendable size because at the time NiFi started streaming that final piece of content in to this claim the size was still below 10 MB. Example 2: Here we can see we have a content claim that contains only one piece of content. This is because once the content was written to this claim, the claim exceeded the configured max appendable size. If your dataflow(s) deal with nothing but files over 10 MB in size, all your content claims will contain only one piece of content. So when is a "Content Claim" moved to archive? A content claim cannot be moved into the content repository archive until none of the pieces of content in that claim are tied to a FlowFile that is active anywhere within any dataflow on the NiFi canvas. What this means is that the reported cumulative size of all the FlowFiles in your dataflows will likely never match the actual disk usage in your content repository. This cumulative size is not the size of the content claims in which the queued FlowFiles reside, but rather just the reported cumulative size of the individual pieces of content. It is for this reason that it is possible for a NiFi content repository to hit 100% disk usage even if the NiFi UI reports a total cumulative queued data size of less than that. Take Example 1 from above. Assuming the last piece of content written to that claim was 100 GB in size, all it would take is for one of those very small pieces of content in that same claim to still exist queued in a dataflow to prevent this claim from being archived. As long as a FlowFile still points at a content claim, that entire content claim can not be purged. When fine tuning your NiFi default configurations, you must always take into consideration your intended data. if you are working with nothing, but very small OR very large data, leave the default values alone. If you are working with data that ranges greatly from very small to very large, you may want to decrease the max appendable size and/or max flow file settings. By doing so you decrease the number of FlowFiles that make it into a single claim. This in turns reduces the likelihood of a single piece of data keeping large amounts of data still active in your content repository.
... View more
Labels:
02-02-2017
07:25 PM
9 Kudos
How to access your secured NiFi instance or cluster: So you have secured your NiFi instance (meaning you have configured it for https access) and now you are trying to access the https web UI. Once NiFi is secured, any entity interacting with the UI will need to successfully authenticate and then be authorized to access the particular NiFi resource(s). As of HDF 2.1.1 or Apache NiFi 1.1.0, NiFi supports authentication via user certificates (default - always enabled), Kerberos/Spnego, or username and password based authentication via LDAP, LDAPS, or kerberos. The intent of this article is not to cover the authentication process, but rather to cover the initial admin authorization process. We assume for this article that authentication is successful. How do you know? A quick look in the nifi-user.log will tell you if your users authentication was successful. Following successful authentication comes NiFi Authorization. NiFi authorization can be handled by NiFi's default built in file based authorizer or handled externally via Ranger. This article will cover the default built in file based authorizer. NiFi's built in file based authorization: There are four files in NiFi that contain properties used by NiFi file based authorizer:
nifi.properties authorizers.xml users.xml authorizations.xml We will start by showing what role each of these files plays in NiFi user/server authorization. nifi.properties file (Pattern Mapping): The nifi.properties file a lot of key/value pairs that are used my NiFi's core. This file happens to be where users can define identity mapping patterns. These properties allow normalizing user identities such that identities coming from different identity providers (certificates, LDAP, Kerberos) can be treated the same internally in NiFi. It is the resulting value from a matching pattern that is passed to the configured authorizer (NiFi's file based or Ranger). NiFi includes two examples that are commented out in the nifi.properties file; however, you can add as many unique identity mapping patterns as you need. nifi.security.identity.mapping.pattern.dn=^CN=(.*?),OU=(.*?),O=(.*?),L=(.*?),ST=(.*?),C=(.*?)$
nifi.security.identity.mapping.value.dn=$1@$2
nifi.security.identity.mapping.pattern.kerb=^(.*?)/instance@(.*?)$
nifi.security.identity.mapping.value.kerb=$1@$2nifi.security.identity.mapping.value.kerb=$1@$2 All mapping patterns use java regular expressions. They are case sensitive and white space matters between elements. for example ^CN=(.*?),OU=(.*?),O=(.*?),L=(.*?),ST=(.*?),C=(.*?)$ would match on: CN=John Doe,O=SME,L=Bmore,ST=MD,C=US but would not match on: cn=John Doe, o=SME, l=Bmore, st=MD, c=US (Note the lowercase and white spaces) Assuming a DN of CN=John Doe,O=SME,L=Bmore,ST=MD,C=US the associated mapping value would return John Doe@SME Additional mapping patterns can be added simply by adding additional properties to the nifi.properties file similar to the above examples except each must have a unique value following nifi.security.identity.mapping.pattern. or nifi.security.identity.mapping.value. . For example: nifi.security.identity.mapping.pattern.dn2=^CN=(.*?), OU=(.*?)$
nifi.security.identity.mapping.value.dn2=$1 While you can create as many mapping patterns as you like, it is important to make sure that you do not have more then one pattern that can match your incoming user/server identity. Those user identities are run against every configured pattern and only the last pattern that matches will be applied. authorizers.xml (Default configuration supports file-provider) This file is where you will setup your NiFi file based authorizer. It is this file in which you will find the "Initial Admin Identity" property. It is very important that you correctly define an "Initial Admin Identity" before starting your secured https NiFi for the first time. (no worries if you have not, I will discuss how to fix issues when you did not or had a typo). If you are securing a NiFi cluster, you will also need to configure a "Node Identity x" for each node in your cluster (where "x" is sequential numbers). *** Don't forget to remove the comment lines "<!---" and "-->" from around these properties. So, what values should I be providing to these properties? That depends on a few factors: Which authentication method did I use?
User/server/node certificates (default) - User certificates will have a DN in the certificate for that user. This full DN is evaluated by any configured identity mapping patterns and the result is passed to the authorizer. NiFi nodes can only use server certificates to authenticate. Each server is issued server certificates and the Full DNs form those certificates are evaluated by any configured identity mapping patterns and the result is passed to the authorizer. Kerberos/Spnego - The users principal is evaluated by any configured identity mapping patterns and the result is passed to the authorizer. LDAP/LDAPS - Users are presented with a login screen. NiFi's LDAP configuration can be setup to pass either the DN returned by LDAP for the user (default) or the username (supplied at login screen). This return is evaluated by any configured identity mapping patterns and the result is passed to the authorizer. Kerberos - Users are presented with a login screen. The user's principal is evaluated by any configured identity mapping patterns and the result is passed to the authorizer. Did I setup identity pattern mappings?
If no identity mapping patterns were defined, the full return from the configured authentication is passed to the authorizer. If the user/server identity fails to match on any of the defined identity mapping patterns, the full return from the configured authentication is passed to the authorizer. What ever the final resulting value will be is what needs to be entered in the "Initial Admin Identity" and " Node Identity x" properties: Let's assume the following user/server DNs and that multiple identity mappings were setup in the nifi.properties file: Sample entity DN: Configured Identity Mapping Pattern: Configured Identity Mapping Value: Resulting value: cn=JohnDoe,ou=SME,dc=work ^cn=(.*?),ou=(.*?),dc=(.*?),dc=(.*?)$ $1 JohnDoe CN=nifi-server1, OU=NIFI ^CN=(.*?), OU=(.*?)$ $1 nifi-server1 CN=nifi-server2, OU=NIFI ^CN=(.*?), OU=(.*?)$ $1 nifi-server2 Your authorizers.xml file would then look like this: The values configured here will be used to seed the users.xml and authorizations.xml files. users.xml The users.xml file is produced the first time and only the first time NiFi is started securely (https). This file will contain your "Initial Admin Identity" and all your "Node Identity x" configured values:
authorizations.xml The Authorizations.xml file is produced the first time and only the first time NiFi is started securely (https). NiFi will assign the access policies needed by your "Initial Admin Identity" and "Node Identity x" users/servers: As you can see, your "Initial Admin Identity" user was granted the following resources/access policies: Resource: NiFi UI Access Policy: Details: /flow (R) view the UI
All users including admin must have this access policy in order to access and view the NiFi UI. /restricted-components (W) access restricted components This access policy allows granted users the ability to add/configure NiFi components tagged as restricted on the canvas. /tenants (R and W) access users/user groups
(view and modify) This access policy allows granted users the ability to add/remove/modify new users and user groups to NiFi for authorization. /policies (R and W) access all policies
(view and modify) This access policy allows granted users the ability to add/remove various access policies for any users and user groups. /controller (R and W) access the controller (view and modify) This access policy allows granted users the ability to view/modify the controller including Reporting Tasks, Controller Services, and Nodes in the Cluster You may notice a few additional access policies were granted to your admin user. This will only happen if the NiFi you have secured already had a an existing flow.xml.gz file. In this case the "Initial Admin Identity" is also granted access to view and modify the dataflow at the NiFi root canvas level. By default all sub NiFi process groups inherit their access policies from the parent process group. This effectively gives the admin user full access to the dataflow. The "Node Identity x" servers are granted the following access policies: Resource: NiFi UI Access Policy: Details: /proxy (R and W) proxy user requests
(view and modify) Allows proxy machines to send requests on the behalf of others.
All nodes in a NiFi cluster must be granted this access policy so
users can make changes to the cluster while logged in to any of
the NiFi Cluster's nodes. What do I do if i messed up my "Initial Admin Identity" or "Node Identity x" values when setting up my authorizers.xml file? Its is common for users to incorrectly configure the value for the either the "Initial Admin Identity" or "Node Identity x" values. Common mistakes include bad mapping patterns, case sensitivity issues (LDAP DNs always have the cn, ou, etc values in lowercase), white space issues between DN sections (cn=JohnDoe, ou=sme versus cn=JohnDoe,ou=sme). You can use the nifi-user.log to identify the actual value being passed to the authorizer and then follow these steps: Correct your authorizers.xml configuration Delete or rename the current users.xml and authorizations.xml files on all of your NiFi nodes. restart all your nifi nodes NiFi will generate new users.xml and authorizations.xml files from the corrected authorizers.xml file. You should only follow this procedure to correct issues when first setting up a secured NiFi. If an Admin was able to previously access your NiFi's canvas and add new users and granted access policies to those users, all those users and access policies will be lost if you delete the users.xml and authorizations.xml files. Thanks, Matt
... View more
Labels:
01-27-2017
08:53 PM
7 Kudos
With HDF 2.x, Ambari can be used to deploy a NiFi cluster. Lets say you deployed a 2 node cluster and want to go back at a later time and add an additional NiFi node to the cluster. While the process is very straight forward when your NiFi cluster has been setup non-secure (http), the same is not true if your existing NiFi cluster has been secured (https). Below you will see an existing 2 node secured NiFi cluster that was installed via Ambari: STEP 1: Add new host through Ambari. You can skip this step if the host you want to install the additional NiFi node on is already managed by your Ambari. STEP 2: Under "Hosts" in Ambari click on the host form the list where you want to install the new NiFi node. The NiFi component will be in a "stopped" state after it is installed on this new host. *** DO NOT START NIFI YET ON NEW HOST OR IT WILL FAIL TO JOIN CLUSTER. *** STEP 3: (This step only applies if NiFi's file based authorizer is being used) Before starting this new node we need to clear out some NiFi Configs. This step is necessary because of how the NiFi application starts. When NiFi starts it looks for the existence of a users.xml and authorizations.xml files. If they do not exist, it uses the configured "Initial Admin Identity" and "Node identities (1,2,3, etc...)" to build the users.xml and authorizations.xml files. This causes a problem because your existing clusters users.xml and authorizations.xml files likely contain many more entires by now. Any mismatches in these files will prevent a node from being able to join the cluster. If these configurations are not present, the new node will grab them from the cluster it joins. Below shows what configs need to be cleared in NiFi: *Note: Another option is to simply copy the users.xml and authorizations.xml files from an existing cluster node to the new node before starting the new node. STEP 4: (Do this step if using Ambari metrics) When a new node is added by Ambari and Ambari metrics are also enabled, Ambari will create a flow.xml.gz file that contains just the ambari reporting task. Later when this node tries to join the cluster, the flow.xml.gz files between this new node and the cluster will not match. This mis-match will trigger the new node to fail to join cluster and shut back down. In order to avoid this problem the flow.xml.gz file must be copied from one of the cluster's existing nodes to this new node. STEP 5: Start NiFi on this new node. After the node has started, it should successfully join your existing cluster. If it fails, the nifi-app.log will explain why, but will likely be related to one of the above configs not being cleared out causing the users.xml and authorizations.xml files to get generated rather then inherited from the cluster. If that is the case you will need to fix the configs and delete those files manually before restarting the node again. STEP 6: While you cluster is now up and running with the additional node, but you will notice you cannot open the UI of that new node without getting an untrusted proxy error screen. You will however still be able to access your other two node's UIs. So we need to authorize this new node in your cluster. A. If NiFi handles your authorizations, follow this procedure: 1. Log in to the UI of one of the original cluster nodes. The "proxy user requests" access policies is needed to allow users to access the UI of your nodes. NOTE: There may be additional component level access policies (such as "view the data" and "modify the data") you may also want to authorize this new node for. B. If Ranger handles your NiFI authorizations, follow this procedure: 1. Access the Ranger UI: 2. Click Save to create this new user for your new node. Username MUST match exactly with the DN displayed in the untrusted proxy error screen. 3. Access the NiFi service Manager in Ranger and authorize your new node to your existing access policies as needed: You should now have a full functional new node added to your pre-existing secured NiFi cluster that was deployed/installed via Ambari.
... View more
Labels:
11-02-2016
06:26 PM
1 Kudo
@apsaltis I might suggest we make a few changes to this article: 1. The link you have for installing HDF talks about installing HDF 2.0. HDF 2.0 is based off Apache NiFi 1.0. Since MiNiFi is built from Apache NiFi 0.6.1, the dataflows built and templated for conversion into MiNiFi YAML files must also be built using an Apache 0.6 based NiFi install. (I see in your example above you did just that but this needs to be made clear) 2. I would never recommend setting nifi.remote.input.socket.host= to "localhost". When a NiFi or MiNiFi connects to another NiFi via S2S, the destination NiFi will return the value set for this property along with the value set for nifi.remote.input.socket.port=. In your example that means the source MiNiFi would then try to send FlowFiles to localhost:10000. This is ONLY going to work if the destination NIFi is located on the same server as MiNiFi. 3. You should also explain why you are changing nifi.remote.input.secure= from true to false. Changing this is not a requirement of MiNiFi, it is simply a matter of preference (If set to true, both MiNiFi (source) and NiFi (destination) must be setup to run securely over https). In your example you are working with http only. 4. While doable, one should never route the "success" relationship from any processor back on to itself. If you have reached the end of your dataflow, you should auto-terminate the "success" relationship. 5. I am not clear what you are telling me to do based on this line under step 5:
Start the From MiNiFi Input Port 6. When using the GenerateFlowFile processor in an example flow it is important to recommend that user set a run schedule other then "0 sec". Since MiNiFi is Apache 0.6.1 based there is no default backpressure on connections and with a run schedule of "0 sec" it is very likely this processor will produce FlowFiles much faster then they can be sent across S2S. This will eventual fill the hard drive of the system running MiNiFi. An even better recommendation would be to make sure they set back pressure between the GenerateFlowFile processor and the Remote Process Group (RPG). That way even if someone stops the NiFi and not the MiNiFi they don't fill their MiNiFI hard drive. Thanks, Matt
... View more
04-26-2016
09:21 PM
There are additional items that will need to be taken in to consideration if you are running a NiFi cluster. See the following for more details:
https://community.hortonworks.com/content/kbentry/28180/how-to-configure-hdf-12-to-send-to-and-get-data-fr.html
... View more
04-18-2016
09:28 PM
4 Kudos
Setting up Hortonworks Dataflow (HDF) to work with kerberized Kafka in Hortonworks Data Platform (HDP) HDF 1.2 does not contain the same Kafka client libraries as the Apache NiFi version. HDF Kafka libraries are specifically designed to work with the Kafka versions supplied with HDP. The following Kafka support matrix breaks down what is supported in each Kafka version: *** (Apache) refers to the Kafka version downloadable from the Apache website. For newer versions of HDF (1.1.2+), NiFi uses
zookeeper to maintain cluster wide state. So the following only applies if this
is a HDF NiFi cluster: 1. If a NiFi cluster has been setup to use a
kerberized external or internal zookeeper for state, every kerberized
connection to any other zookeeper would require using the same keytab and
principal. For example a kerberized embedded zookeeper in NiFi would need
to be configured to use the same client keytab and principal you want to use to
authenticate with a say a Kafka zookeeper. 2. If a NiFi cluster has been setup to use a
non-kerberized zookeeper for state, it cannot then talk to any other zookeeper
that does use kerberos. 3. If a NiFi cluster has been setup to use a kerberized
zookeeper for state, it cannot then communicate with any other non-kerberized
zookeeper. With that being said,
the PutKafka and GetKafka processors do not have properties like the HDFS
processors for keytab and principal. The keytab and principal would be
defined in the same jaas file used if you setup HDF cluster state management.
So before even trying to connect to kerberized Kafka, we need to get NiFi
state management configured to use either an embedded or external kerberized
zookeeper for state. Even if you are not clustered right now, you need to take
the above in to consideration if you plan on upgrading to being a cluster
later: —————————————— NiFi Cluster Kerberized State Management: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#state_management Lets assume
you followed the above linked procedure to setup your NiFi cluster to create an
embedded zookeeper. At the end of the above procedure you will have made
the following config changes on each of your NiFi Nodes: 1. Created a zookeeper-jaas.conf file On nodes with embedded zookeeper, it will contain
something like this: Server
{ com.sun.security.auth.module.Krb5LoginModule
required useKeyTab=true
keyTab="./conf/zookeeper-server.keytab" storeKey=true useTicketCache=false principal="zookeeper/myHost.example.com@EXAMPLE.COM"; }; Client
{ com.sun.security.auth.module.Krb5LoginModule
required useKeyTab=true keyTab="./conf/nifi.keytab" storeKey=true useTicketCache=false principal="nifi@EXAMPLE.COM"; }; On Nodes without embedded zookeeper, it will look
something like this: Client
{ com.sun.security.auth.module.Krb5LoginModule
required useKeyTab=true keyTab="./conf/nifi.keytab" storeKey=true useTicketCache=false principal="nifi@EXAMPLE.COM"; };
2. Added a config line to the NiFi
bootstrap.conf file: java.arg.15=-Djava.security.auth.login.config=/<path>/zookeeper-jaas.conf
*** the arg number (15 in this case) must
be unused by any other java.arg line in the bootstrap.conf file 3. Added 3 additional properties to the bottom of
the zookeeper.properties file you have configured per the linked procedure
above: authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider jaasLoginRenew=3600000 requireClientAuthScheme=sasl
————————————— Scenario 1 : Kerberized
Kafka setup for NiFI Cluster: So for scenario one, we will assume you are
running on a NiFi cluster that has been setup per the above to use a kerberized
zookeeper for NiFi state management. Now that you have that setup, you have the
foundation in place to add support for connecting to kerberized Kafka brokers
and Kafka zookeepers. The PutKafka processor connects to the
Kafka broker and the GetKafka processor connects to the Kafka zookeepers.
In order to connect to via Kerberos, we will need to do the following: 1. Modify
the zookeeper-jaas.conf file we created when you setup the kerberized state
management stuff above: You will need to add a new section to the
zookeeper-jass.conf file for the Kafka client: If your NiFi node is running an embedded
zookeeper node, your zookeeper-jaas.comf file will contain: Server
{ com.sun.security.auth.module.Krb5LoginModule
required useKeyTab=true
keyTab="./conf/zookeeper-server.keytab" storeKey=true useTicketCache=false principal="zookeeper/myHost.example.com@EXAMPLE.COM"; }; Client
{ com.sun.security.auth.module.Krb5LoginModule
required useKeyTab=true keyTab="./conf/nifi.keytab" storeKey=true useTicketCache=false principal="nifi@EXAMPLE.COM”; }; KafkaClient
{
com.sun.security.auth.module.Krb5LoginModule required
useTicketCache=true
renewTicket=true
serviceName="kafka"
useKeyTab=true
keyTab="./conf/nifi.keytab"
principal="nifi@EXAMPLE.COM"; };
*** What is important to note here is that both
the “KafkaClient" and “Client" (used for both embedded zookeeper and
Kafka zookeeper) use the same principal and key tab *** *** The principal and key tab for the “Server”
(Used by the embedded NiFi zookeeper) do not need to be the same used by the
“KafkaClient" and “Client” *** If your NiFi cluster node is not running an
embedded zookeeper node, your zookeeper-jaas.comf file will contain: Client
{ com.sun.security.auth.module.Krb5LoginModule
required useKeyTab=true keyTab="./conf/nifi.keytab" storeKey=true useTicketCache=false principal="nifi@EXAMPLE.COM”; }; KafkaClient
{
com.sun.security.auth.module.Krb5LoginModule required
useTicketCache=true
renewTicket=true
serviceName="kafka"
useKeyTab=true
keyTab="./conf/nifi.keytab"
principal="nifi@EXAMPLE.COM"; };
*** What is important to note here is that
both the KafkaClient and the Client (used for both embedded zookeeper and Kafka
zookeeper) use the same principal and key tab *** 2. Add
additional property to the PutKafka and GetKafka processors: Now all the pieces are in place and we can start
our NiFi(s) and add/modify the PutKafka and GetKafka processors. You will need to add one new property by using
the on each putKafka and getKafka
processors “Properties tab: You will use this same security.protocol
(PLAINTEXTSASL) when intereacting with HDP Kafka versions 0.8.2 and 0.9.0. ———————————— Scenario 2 : Kerberized
Kafka setup for Standalone NiFi instance: For scenario two, a standalone NiFi does not use
zookeeper for state management. So rather then modifying and existing jaas.conf
file, we will need to create one from scratch. The PutKafka processor connects to the
Kafka broker and the GetKafka processor connects to the Kafka zookeepers.
In order to connect to via Kerberos, we will need to do the following: 1. You
will need to create a jaas.conf file somewhere on the server running your NiFi
instance. This file can be named whatever you want, but to avoid
confusion later should you turn your standlone NiFi deployment in to a NiFi
cluster deployment, I recommend continuing to name the file
zookeeper-jaas.conf. You will need to add the following lines to this
zookeeper-jass.conf file that will be used to talk to communicate with the
Kerberized Kafka brokers and Kerberized Kafka zookeeper(s) : Client
{ com.sun.security.auth.module.Krb5LoginModule
required useKeyTab=true keyTab="./conf/nifi.keytab" storeKey=true useTicketCache=false principal="nifi@EXAMPLE.COM”; }; KafkaClient
{
com.sun.security.auth.module.Krb5LoginModule required
useTicketCache=true
renewTicket=true
serviceName="kafka"
useKeyTab=true
keyTab="./conf/nifi.keytab"
principal="nifi@EXAMPLE.COM"; };
*** What is important to note here is that
both the KafkaClient and Client configs use the same principal and key tab *** 2. Added a config line
to the NiFi bootstrap.conf file: java.arg.15=-Djava.security.auth.login.config=/<path>/zookeeper-jaas.conf
*** the arg number (15 in this case) must
be unused by any other java.arg line in the bootstrap.conf file 3. Add
additional property to the PutKafka and GetKafka processors: Now all the pieces are in place and we can start
our NiFi(s) and add/modify the PutKafka and GetKafka processors. You will need to add one new property by using
the on each putKafka and getKafka
processors “Properties tab: You will use this same security.protocol
(PLAINTEXTSASL) when interacting with HDP Kafka versions 0.8.2 and 0.9.0. ———————————————————— That should be all you need to get setup and
going…. Let me fill you in on a few configuration
recommendations for your PutKafka and getKafka processors to achieve better
throughputs:
PutKafka: 1. Ignore for now what the documentation says for
the Batch Size property on the PutKafka processor. It is really a measure
of bytes, so jack that baby up from the default 200 to some much larger value. 2. Kafka can be configured to accept larger files
but is much more efficient working with smaller files. The default max
messages size accepted by Kafka is 1 MB, so try to keep the individual messages
smaller then that. Set the Max Record Size property to the max size a
message can be, as configured on your Kafka. Changing this value will not
change what your Kafka can accept, but will prevent NiFi from trying to send
something to big. 3. The Max Buffer Size property should be set to a
value large enough to accommodate the FlowFiles it is being fed. A single
NiFi FlowFile can contain many individual messages and the Message Delimiter
property can be used to split that large FlowFile content into is smaller
messages. The Delimiter could be new line or even a specific string of
characters to denote where one message ends and another begins. 4. Leave the run schedule at 0 sec and you may even
want to give the PutKafka an extra thread (Concurrent tasks)
GetKafka: 1. The Batch Size property on the GetKafka processor
is correct in the documentation and does refer to the number of messages to
batch together when pulled from a Kafka topic. The messages will end up
in a single outputted FlowFile and the configured Message Demarcator (default
new line) will be used to separate messages. 2. When pulling data from a Kafka topic that has
been configured to allow messages larger than 1 MB, you must add an additional
property to the GetKafka processor so it will pull those larger messages (the
processor itself defaults to 1 MB). Add fetch.message.max.bytes and
configure it to match the max allowed message size set on Kafka for the topic. 3. When using the GetKafka processor on a Standalone
instance of NiFi, the number of concurrent tasks should match the number of
partitions on the Kafka topic. This is not the case (dispite what the bulletin
tell you when it is started) when the GetKafka processor is running on a NIFi
cluster. Lets say you have 3 node NiFi cluster. Each Node in
the cluster will pull from a different partition at the same time. So if the
topic only has 3 partitions you will want to leave concurrent tasks at 1
(indicates 1 thread per NiFi node). If the topic has 6 partitions, set
concurrent tasks to 2. Let say the topic has 4 partitions, I would still
use one concurrent task. NiFi will still pull from all partitions, the
addition partition will be included in a Round Robin fashion. If you were
to set the same number of concurrent tasks as partitions in a NiFi cluster, you
will end up with only one Node pulling from every partition while your other
nodes sit idle. 4. Set your run schedule 500 ms to reduce excessive
CPU utilization.
... View more
02-16-2016
03:41 PM
9 Kudos
The purpose of this article is to provide the steps
needed to create your own certificates for securing your NiFi instance(s). The
article will also cover creating your own Certificate Authority (CA) that you
can use to sign all the certificates you create. This article is not intended to be a best practices guide to creating secure keys. While we will provides tips, users should carefully research the various security options available when creating keys. This procedure assumes you have Java Keytool
and OpenSSL installed on your system. HDF 1.x or Apache NiFi 0.x Secured UI: HDF 2.x or Apache NiFi 1.x Secured UI: Creating
your Certificate Authority: You only need to create one CA, which you will use to sign the keys for every one of your servers/VMs and
users (You only need to create keys for users if your NiFi has not been
configured to use LDAP authentication).
What is a CA? The CA acts as a
trusted entity for validating the authenticity of certificates. The CA is used to certify the authenticity of the keys (server and user) you create and should be carefully protected. User should read the following wiki on CAs for a more detailed description: https://en.wikipedia.org/wiki/Certificate_authority Commands for creating a CA: *** Users should use strong passwords whenever prompted. When working with Java keystores, it is recommended that both the key password and the keystore password match. *** NOTE: Security requirements are more stringent as newer versions of browsers and NiFi are being used since this article was originally written. The below command should be changed to use "-aes256". ***You must type 'yes' to trust this certificate. The following command can be used to do a verbose listing of the contents of the above created keystore: keytool -v -list -keystore truststore.jks At the end of the above you will have your
"truststore" file (truststore.jks) that you will use in your
nifi.properties file. Use this same "truststore" file on every one of
your servers/VMs. You may also choose to load the rootCA.der or rootCA.pem key into
your browser as another authority. This is not required, but without this
authority loaded you will need to add a certificate exception when you try to
access the NiFi https URL. Edit the following lines in your nifi.properties file: nifi.security.truststore=/<path to certs>/truststore.jks nifi.security.truststoreType=JKS nifi.security.truststorePasswd=<MyTruststorePassord> nifi.security.needClientAuth=true
Creating
your Server Keystore: Now lets create a server/vm key and get it signed
by that CA: *** Users should use strong passwords whenever prompted. When working with Java keystores, it is recommended that both the key password and the keystore password match. The following procedure will [1] create your server/VMs private key, [2] Generate a Certificate Signing Request (.csr), [3] Use CSR to get your key signed by your CA using the CAs private key, [4] Import the public key for your CA in to your keystore, and [5] Import your signed certificate (.crt) in to your keystore to form the complete trusted chain.
At the end of the above you will have your
"keystore" file (nifi-server1.jks) that you will use in your
nifi.properties file for one of your servers/VMs. You will need to repeat the
above steps for each of your other servers/VMs so they each use their own keys.
Now keep in mind that I am using “nifi-server1" in this example, but
you will most likely use your systems/VMs hostnames (shortname as alias and
FQDN as CN). I also highly recommend that you use the same key and keystore
password for every key you create if creating keys for multiple nodes in a NiFi
cluster. The following lines need to be edited in the nifi.properties file: nifi.security.keystore=/<path to your certs>/nifi-server1.jks nifi.security.keystoreType=JKS nifi.security.keystorePasswd=<yourkeystorePassword> nifi.security.keyPasswd=<yourKeyPassword> Also make sure that you set the following property
in the nifi.properties file to true: nifi.security.needClientAuth=true Additional configurations for NiFi clusters
only: When working with a NiFi cluster, it is recommended
that you change the default NiFi user authority provider. The default is
file-provider. On your NCM you should change file-provider to
cluster-ncm-provider and on your nodes file-provider should be changed to
cluster-node-provider. nifi.security.user.authority.provider= You will also need to edit the authority-providers.xml
file to configure both of these new providers.
Remove the comments ( “<!--“ and “-->” ) surrounding
the section of XML associated to the provider you are enabling: Example NCM provider configuration: Example Node provider configuration:
Creating
User Keys for key based authentication: Now that you have all the keys you need for the
systems in your cluster, you will need to create some keys for your users to
load into their web browsers in order to securely access your NiFi. This step is not necessary if you have setup
your NiFi to use LDAP for user authentication. This is done in much of the same
way as you created your server keys: *** Users should use strong passwords whenever prompted.
Now you have a p12 file for user1, they can load this
in to their browser certs to use to authenticate against your secure
NiFi. Import your <user1>.p12 file in to your certificates for your
preferred browser. --------- HDF 1.x or Apache NIFi 0.x only: Now remember you must manually add that first
"ROLE_ADMIN" user to the authorized-users.xml file. So you will need
the DN from the user key you created for this Admin user and add it in to your
Authorized-users.xml file. --------- HDF 2.x or Apache NiFi 1.x only: You must configure your "Initial Admin Identity" in the authorizers.xml file. That Initial Admin Identity value must match the user's DN from the .p12 file exactly. --------- Here is an example of what it may look like: dn="EMAILADDRESS=none@none.com, CN=<user1>, OU=NiFi, O=someplace, L=Baltimore, ST=Maryland, C=US" Troubleshooting authentication issues: If you have the DN format wrong in your
authorized-users.xml file, rather then gaining access to the NiFi you will get
prompted to "request access”. Do not click the request
access link. You must instead go fix the DN in the authorized-users.xml file.
You need to create that first admin account that can approve those
requests. If you click request access, you will need to stop your NiFi
and delete the nifi-users.h2.db file (located inside the database_repository
directory), otherwise, even fixing your authorized-usesr.xml file will not gain
you access because your account will be stuck in a pending auth state. You can look at the request that came in in the
nifi-users.log to get the exact DN pattern to fix your authorized-usesr.xml
file entry: You should see something that looks like
this: INFO [NiFi Web Server-58023]
o.a.n.w.s.x509.X509AuthenticationFilter Attempting request for (<CN=JohnDoe, OU=MyBusiness, O=MyOrg, L=Baltimore, ST=MD, C=US>)
GET... That log line gives you the exact format of the DN
that needs to be updated/added to the authorized-users.xml file. Example
below: <user dn="CN=John Doe, OU=MyBusiness, O=MyOrg, L=Baltimore,
ST=MD, C=US">
<role name="ROLE_DFM"/>
<role name="ROLE_ADMIN"/>
<role
name="ROLE_PROVENANCE"/>
</user>
... View more
Labels:
02-11-2016
09:36 PM
19 Kudos
The purpose of this article is to explain what Process Groups
and Remote Process Groups (RPGs) are and how input and output ports are used to
move FlowFiles between them. Process groups are a valuable addition to any
complex dataflow. They give DataFlow Managers (DFMs) the ability to group a set
of processors on to their own imbedded canvas. Remote Process groups allow a
DFM to treat another NiFi instance or cluster as just another process group in
the larger dataflow picture. Simply being able to build flows on different
canvases is nice, but what if I need to move NiFi FlowFiles between these
canvases? This is where input and output ports come in to play. They allow
you move FlowFiles between these canvases that are either local to a single
NiFi or between the canvases of complete different NiFi instances/clusters.
Embedded Process Groups:
Lets start by talking about the simplest use of multiple embedded canvases
through process groups. When you started NiFi for the very first time you are
given a blank canvas. This blank canvas is noting more then a process group in
itself. The process group is referred to
as the root process group.
From there you are able to add additional process groups to that top-level
canvas. These added process groups allow you drill down in to them giving
additional blank canvases you could build dataflows on. When you enter a
process group you will see the hierarchy represented just above the canvas in
the UI ( NiFi Flow >>
Process Group 1 ). NiFi does not restrict the number of process
groups you can create or the depth you can go with them. You could compare the
process group hierarchy to that of a Windows directory structure. So if you
added another process group inside one that you already created, you would
essentially now have gone two layers deep. (
NiFi Flow >> Process Group 1
>> Process Group 2 ).
The hierarchy represented above you canvas allows you to quickly jump up one or
more layers all the way to the root level by simply clicking on the name of the
process group. While you can add any number of process groups at the same
embedded level, the hierarchy is only shown from root down to the current
process group you are in.
Now that we understand how to add embedded process groups, lets talk about how
we move data in and out of these process groups. This is where input and output
ports come in to play. Input and output ports exist to move FlowFIles between a
process group andONE LEVEL UPfrom that process group. Input
ports will accept FlowFiles coming from one level up and output ports allow
FlowFiles to be sent one level up. If I have a process group added to my
canvas, I cannot drag a connection to it until at least one input port exists
inside that process group. I also cannot drag a connection off of that process
group until at least on output port exists inside the process group. You can only
move FlowFiles up or down one level at a time. Given the example of a process
group within another process group, FlowFiles would need to be moved from the
deepest level up to the middle layer before finally being able to be moved to
the root canvas. In the above example I have a small flow pushing FlowFiles into an embedded
process group (Process Group 1) and also pulling data from the same embedded
process group. As you can see, I have
created an input and output port inside Process Group 1. This allowed me to
draw a connection to and from the process group on the root canvas layer. You
can have as many different input and output ports inside any process group as
you like. When you draw the connection
to a process group, you will be able to select which input port to send the
FlowFiles to. When you draw a connection from a process group to another
processor, you will be able to pick which output port to pull FlowFiles from. Every input and output port within a single process group must
have a unique name. NiFi validates the port name to prevent this from
happening. Remote Process Groups: We refer to the ability to send FlowFiles between different NiFi
instances as Site-to-Site. Site-to-Site is configured very much in the same way
we just configured moving files between embedded process groups on a single
NiFi instance. Instead of moving FlowFiles between different process groups
(layers) within the same NiFi, we are moving FlowFiles between different NiFi
instances or clusters. If a DFM reaches a point in their dataflow where they
want to send data to another NiFi instance or cluster, they would add a Remote
Process Group (RPG). These Remote Process Groups are not configured with unique
system port numbers, but instead all utilize the same Site-to-Site port number
configured in your nifi.properties files. I will not be covering the specific
NiFi configuration needed to enable site-to-site in this article. For information on
how to enable and configure Site-to-Site on a NiFi instance, see the Site-to-Site
Properties section of the Admin Guide. Lets take a quick look at how these two components differ: As I explained earlier, input and output ports are used to move FlowFiles
one level up from the process group they are created in. At the top level of your
canvas (root process group level) adding input or output ports provides the
ability for that NiFi to receive (input port) FlowFiles from another NiFi
instance or have another NiFi pull files from (output port) that NiFi. We refer
to input and output ports added the top level as remote input or output ports. While
the same input and output icon in the UI is used to add both remote and
embedded input and output ports, you will notice that they are rendered
differently when added to the canvas. If your NiFi has been configured to be secure (HTTPS) using
server certificates, the remote input/output port’s configuration windows will
have an “Access Control” tab where you must authorize which remote NiFI systems
are allowed to see and access these ports. If not running secure, all remote
ports are exposed and accessible by any other NiFi instance.
In single instance you can send data to an input port inside a process group by
dragging a connection to the process group and selecting the name of the input
port from a selection menu provided. Provided that the remote NiFi instance has
input ports exposed to your NiFi instance, you can drag a connection to the RPG
much in the same way you previously dragged a connection to the embedded
process groups within a single instance of NiFi. You can also hover over the
RPG and drag a connection off of the RPG, which will allow you to pull data
from an available output port on the target NiFi. The Source NiFi (standalone or cluster) can have as many RPGs as
a DFM would like. You can have multiple RPGs in different areas of your
dataflows that all connect to the same remote instance. While the target NiFi
contains the input and output ports (Only Input and output ports added to root
level process group can be used for Site-to-Site Flowfile transfers). When sending data between two standalone NiFi instance the setup
of your RPG is fairly straight forward. When adding the RPG, simply provide the
URL for the target instance. The source RPG will communicate with the URL to
get the Site-to-Site port to use for FlowFile transfer. When sending FlowFiles via Site-to-Site to a
NiFi that is a NiFi cluster we want the data going to every node in the
cluster. The Site-to-Site protocol handles this for you with some additional load-balancing
benefits built in. The RPG is added and configured to point at the URL of the
NCM. (1)The NCM will respond with the Site-to-Site port for the NCM. (2) The
source will connect to the Site-to-Site port of the NCM which will respond to
the source NiFi with the URLs, Site-to-Site port numbers, and current loads on
every connected node. (3) The source NiFi will then load-balance FlowFile
delivery to each of those nodes giving fewer FlowFiles to nodes that are under
heavier load. The following diagram
illustrates these three steps: A DFM may choose to use Site-to-Site
to redistribute data arriving on a single node in a cluster to every node in
that same cluster by adding a RPG that points back at the NCM for that cluster.
In this case the source NiFi instance is also one of the target NiFi instances.
... View more
Labels:
- « Previous
- Next »