Member since
07-30-2019
3398
Posts
1621
Kudos Received
1001
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 483 | 11-05-2025 11:01 AM | |
| 374 | 11-05-2025 08:01 AM | |
| 596 | 11-04-2025 10:16 AM | |
| 734 | 10-20-2025 06:29 AM | |
| 874 | 10-10-2025 08:03 AM |
02-23-2017
07:06 PM
5 Kudos
There is a two part process before any access to NiFi UI is possible: 1. Authentication: By default NiFi will use a user/server's SSL certificate when provided in the connection to authenticate. When NO user/server certificate is presented, NiFi will then look for a Kerberos TGT (If Spnego has been configured in NiFi). Finally, if neither of the above where present in the connection, NiFi will use the login identity provider (if configured). Login identity providers include either ldap or kerberos. With both of these options, NiFi will present users with a login screen. 2. Authorization: Authorization is the mechanism that controls what features and components authenticated users are granted access. The default authorizer NiFi will use is the internal file based authorizer. There is an option to configure NiFi to use Ranger as the authorizer instead. The intent of this article is not to discuss how to setup NiFi to use any of the Authentication or Authorizer options. This article covers how to modify what identity is passed two the Authorizer after any one of the authentication mechanism is successful. What is actually passed to the authorizer varies depending on which Authentication method is in use. SSL certificates: Default, always enabled, and always checked first NiFi uses the full DN from the certificate. Spnego (kerberos): Always on when enabled and only used if a SSL Certificate was not present in connection. NiFi uses the full user principal. ldap-provider (option in login-identity-providers): Always on once configured and only used if both SSL certificate and TGT (if Spnego was enabled) are not present in connection. Default configuration of ldap-provider will use the full DN returned by LDAP upon successful authentication. (USE_DN Identity Strategy) Can be configured to pass the username used to login instead. (USE_USERNAME Identity Strategy) Kerberos-provider (option in login-identity-providers): Always on once configured and only used if both SSL certificate and TGT (if Spnego was enabled) are not present in connection. The kerberos-provider will use the use the user full principal upon successful authentication. (USE_DN Identity Strategy) Whether you choose to use the built in file based authorizer or optional configure you NiFi to use Ranger instead, users must be added and granted various access policies. Adding users using either full a DN or users principal can be both annoying and prone to errors since the authorizer is case sensitive and white spaces are valid characters. This is where NiFi's identity mapping optional configurations come in to play. Identity mapping takes place after successful authentication and before authorization occurs. It gives you the ability to take the returned value from all four of the authentication methods and pass them through 1 or more mappings to produce a simple resulting value which is then passed to your authorizer. The identity mapping properties are configured in NiFi's nifi.properties file and consist of two parts to each mapping you define: nifi.security.identity.mapping.pattern.<user defined>=
nifi.security.identity.mapping.value.<user defined>= The mapping pattern takes a java regular expression as input with the expectation that one of more capture groups are defined in that expression. One or more of those capture groups are then used in the mapping value to create the desired final result that will be passed to your configured authorizer. **** Important note: If you are implementing pattern mapping on a existing NiFi cluster that is already running securely, the newly added mappings will be run against the DNs from the certificates created for your nodes and the Initial Admin Identity value you originally configured. If any of your mapping match, a new value is going to passed to your authorizer which means you may lose access to your UI. Before adding any mapping make sure you have added the new mapped value users to your NiFi and authorized them so you do not lose access. By default NiFi includes 2 example identity mappings commented out in the NiFi properties file: You can add as many Identity mapping pattern and value as you like to accommodate all your various user/server authentication types. Each must have a unique identifier. In the above examples the unique identifiers are "dn" and "kerb". You could add for example "nifi.security.identity.mapping.pattern.dn2=" and "nifi.security.identity.mapping.value.dn2=" If you are using Ambari to install and manage your NiFi cluster (HDF 2.x version), you can find the 2 sample identity mapping properties under "Advanced nifi-properties": If you want add additional mappings beyond the above 2 via ambari, these would be added via the "Custom nifi-properties" config section. Simply click the "Add Property..." link to add your new mappings. The result of any successful authentication is run through all configured identity mapping until a match is found. If no match is found the full DN or user principal is passed to the authorizer. Let's take a look at a few examples: User/server DN or Principal Identity Mapping Pattern Identity Mapping Value Result passed to authorizer CN=nifi-server-01.openstacklocal, OU=NIFI ^CN=(.*?), OU=(.*?)$ $1 nifi-server-01 CN=nifi-01, OU=SME, O=mycp, L=Fulton, ST=MD, C=US ^CN=(.*?), OU=(.*?), O=(.*?), L=(.*?), ST=(.*?), C=(.*?)$ $1@$2 nifi-01@SME nifi/instance@MY.COMPANY.COM ^(.*?)/instance@(.*?)$ $1@$2 nifi@MY.COMPANY.COM cn=nifi-user1,ou=SME,dc=mycp,dc=com ^cn=(.*?),ou=(.*?),dc=(.*?),dc=(.*?)$ $1 nifi-user1 JohnDoe@MY.COMPANY.COM ^(.*?)@(.*?)$ $1 JohnDoe ^EMAILADDRESS=none@none.com, CN=nifi-user2, OU=SME, O=mycp, L=Fulton, ST=MD, C=US ^EMAILADDRESS=(.*?), CN=(.*?), OU=(.*?), O=(.*?), L=(.*?), ST=(.*?), C=(.*?)$ $2 nifi-user2 As you can see from the above examples, using NiFi's pattern mapping ability with simplify authorizing new users via either NiFi's default file based authorizer or using Ranger.
... View more
Labels:
02-23-2017
02:25 PM
9 Kudos
NiFi works with FlowFiles. Every FlowFile that exists consists of two parts, FlowFile content and FlowFile Attributes. While the FlowFile's content lives on disk in the content repository, NiFi holds the "majority" of the FlowFile attribute data in the configured JVM heap memory space. I say "majority" because NiFi does swapping of Attributes to disk on any queue that contains over 20,000 FlowFiles (default, but can be changed in the nifi.properties). Once your NiFi is reporting OutOfMemory (OOM) Errors, there is no corrective action other then restarting NiFi. If changes are not made to your NiFi or dataflow, you are surely going to encounter this issue again and again. The default configuration for JVM heap in NiFi is only 512 MB. This value is set in the nifi-bootstrap.conf file. # JVM memory settings
java.arg.2=-Xms512m
java.arg.3=-Xmx512m While the default may work for some dataflow, they are going to be undersized for others.
Simply increasing these values till you stop seeing (OOM) error should not be your immediate go to solution. Very large heap sizes could also have adverse impacts on your dataflow as well. Garbage collection will take much longer to run with very large heap sizes. While garbage collections occurs, it is essentially a stop the world event. This amount to dataflow stoppage for the length time it takes for that to complete. I am not saying that you should never set large heap sizes because sometimes that is really necessary; however, you should evaluate all other options first.... NiFi and FlowFile attribute swapping: NiFi already has a built in mechanism to help reduce the overall heap footprint. The mechanism swaps FlowFiles attributes to disk when a given connection's queue exceeds the configured threshold. These setting are found in the nifi.properties file: nifi.swap.manager.implementation=org.apache.nifi.controller.FileSystemSwapManager
nifi.queue.swap.threshold=20000
nifi.swap.in.period=5 sec
nifi.swap.in.threads=1
nifi.swap.out.period=5 sec
nifi.swap.out.threads=4 Swapping however will not help if your dataflow is so large that queues are how everywhere, but still have not exceeded the threshold for swapping. Anytime you decrease the swap threshold, more swapping can occur which may result in some throughput performance. So here are some other things to check for... So some common reason for running out of heap memory include: 1. High volume dataflow with lots of FlowFiles active any any given time across your dataflow. (Increase configured nifi heap size in bootstrap.conf to resolve)
2. Creating a large number of Attributes on every FlowFile. More Attributes equals more heap usage per FlowFile. Avoid creating unused/unnecessary Attributes on FlowFiles. (Increase configured nifi heap size in bootstrap.conf to resolve and/or reduce the configured swap threshold) 3. Writing large values to FlowFile Attributes. Extracting large amounts of content and writing it to an attribute on a FlowFile will result in high heap usage. Try to avoid creating large attributes when possible. (Increase configured nifi heap size in bootstrap.conf to resolve and/or reduce the configured swap threshold) 4. Using the MergeContent processor to merge a very large number of FlowFiles. NiFi can not merge FlowFiles that are swapped, so all these FlowFile's attributes must be in heap when the merge occurs. If merging a very large number of FlowFiles is needed, try using two MergeContent processors in series with one another. Have first merge a max of 20,000 FlowFiles and the second then merge those 10,000 FlowFile files in to even larger bundles. (Increase configured nifi heap size in bootstrap.conf also help) 5. Using the SplitText processor to split one File in to a very large number of FlowFiles. Swapping of a large connection queue will not occur until after the queue has exceeded swapping threshold. The SplitTEXT processor will create all the split FiLowFiles before committing them to the success relationship. Most commonly seen when SpitText is used to split a large incoming FlowFile by every line. It is possible to run out of heap memory before all the splits can be created. Try using two SplitText processors in series. Have the first split the incoming FlowFiles in to large chunks and the second split them down even further. (Increase configured nifi heap size in bootstrap.conf also help) Note: There are additional processors that can be used for splitting and joining large numbers of FlowFiles, so the same approach as above should be followed for those as well. I only specifically commented on the above since they are more commonly seen being used to deal with very large numbers of FlowFiles.
... View more
Labels:
02-23-2017
01:37 PM
1 Kudo
@mayki wogno Every FlowFile that exists consists of two parts, FlowFile content and FlowFile Attributes. While the FlowFile's content lives on disk in the content repository, NiFi holds the "majority" of the FlowFile attribute data in the configured JVM heap memory space. I say "majority" because NiFi does swapping of Attributes to disk on any queue that contains over 20,000 FlowFiles (default, but can be changed in the nifi.properties). So some common reason for running out of heap memory include: 1. High volume dataflow with lots of FlowFiles active any any given time across your dataflow. (Increase configured nifi heap size in bootstrap.conf to resolve) 2. Creating a large number of Attributes on every FlowFile. More Attributes equals more heap usage per FlowFile. (Increase configured nifi heap size in bootstrap.conf to resolve and/or reduce the configured swap threshold) 3. Writing large values to FlowFile Attributes. Extracting large amounts of content and writing it to an attribute on a FlowFile will result in high heap usage. Try to avoid creating large attributes when possible. (Increase configured nifi heap size in bootstrap.conf to resolve and/or reduce the configured swap threshold) 4. Using the MergeContent processor to merge a very large number of FlowFiles. NiFi can not merge FlowFiles that are swapped, so all these FlowFile's attributes must be in heap when the merge occurs. If merging a very large number of FlowFiles is needed, try using two MergeContent processors in series with one another. Have first merge a max of 10,000 FlowFiles and the second then merge those 20,000 FlowFile files in to even larger bundles. (Increase configured nifi heap size in bootstrap.conf also help) 5. Using the SplitText processor to split one File in to a very large number of FlowFiles. Swapping of a large connection queue will not occur until after the queue has exceeded swapping threshold. The SplitTEXT processor will create all the split FiLowFiles before committing them to the success relationship. Most commonly seen when SpitText is used to split a large incoming FlowFile by every line. It is possible to run out of heap memory before all the splits can be created. Try using two SplitText processors in series. Have the first split the incoming FlowFiles in to large chunks and the second split them down even further. (Increase configured nifi heap size in bootstrap.conf also help) Thanks, Matt
... View more
02-23-2017
01:12 PM
1 Kudo
@Ramakrishnan V You will need to use the following curl command to obtain a token for your LDAP user:
curl 'https://<hostname>:<port>/nifi-api/access/token' -H 'Content-Type: application/x-www-form-urlencoded; charset=UTF-8' --data 'username=admin&password=admin' --compressed --insecure Once you have your token you will need to pass that token as the bearer of all subsequent curl command you execute against the NiFi api by adding teh following to your curl commads: -H 'Authorization: Bearer eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJjbj1hZG1pbixkYz1leGFtcGxlLGRjPW9yZyIsImlzcyI6IkxkYXBQcm92aWRlciIsIm
F1ZCI6IkxkYXBQcm92aWRlciIsInByZWZlcnJlZF91c2VybmFtZSI6ImFkbWluIiwia2lkIjoxLCJleHAiOjE0ODcxNDM2OTEs
ImlhdCI6MTQ4NzEwMDQ5MX0.GwwJ0Yz4_KXUAMNIH500jw8YcIk3e6ZdcT3LCrrkHjc' The odd string above is an example of the token you will get back from the first command. Thanks, Matt
... View more
02-22-2017
10:08 PM
1 Kudo
@Joe Petro Yes this is very doable... NiFi automatically creates a FlowFile Attribute called "filename" on every FlowFile that is created. You can use this existing attribute to specify teh target HDFS directory: Of course you will want to modify the above for the complete target path. Thanks, Matt
... View more
02-22-2017
01:54 PM
@mayki wogno No problem... If one of the answers help drive you to a solution to your question, please accept that answer to help drive this community forward.
... View more
02-22-2017
01:36 PM
2 Kudos
@Pradhuman Gupta Which "Protocol" are you using in your PutSplunk processor.
There is no assurance or delivery if you are using UDP. With TCP protocol there is confirmed delivery. You could use NiFi's provenance to track the FlowFile's processed by the PutSplunk processor. This will allow you to get the details on FlowFiles that have "SEND" provenance events associated to them. Thanks, Matt
... View more
02-21-2017
06:49 PM
2 Kudos
@Raj B 1. The main intent of NiFi Provenance is for data governance. The ability to look back at the life of a FlowFile. It can tell you where a FlowFile originated, from what parent FlowFile it was part of, how many parents FlowFiles where used to create it, What changes were made to it, where it was sent, when it was terminated from NiFi, etc... NiFi Provenance also provides a means to view or replay FlowFile's that are no longer anywhere in your dataflow (Provided the FlowFiles content still exists in the content repositories archive) at any point in your dataflow. Examples: - Some downstream system expected to receive file "ABC" over the weekend from NiFi. You can use NiFi's data provenance to see exactly when file "ABC" was received by NiFi and exactly what NiFi did to file "ABC" as it traversed your dataflows. - A FlowFile "XYZ" was expected to route through your dataflow to some destination "G". Upon searching Provenance it was discovered "XYZ" was routed down the wrong path. You could correct you dataflow routing issues and use data provenance to replay "XYZ" just prior to the dataflow correction. 2. NiFi's Provenance repository retains all Provenance events generated via your dataflow up until either retention time or max disk usage properties are met. When either of those conditions are met, the oldest provenance events are deleted first. There is no way to selectively decide which provenance events are retained in the repository. Using the 3. The Provenance API provides a means for running queries directly against the Provenance data stored local to a particular NiF instance. The SiteToSiteProvenanceReportingTask provides a way of sending provenance events to another system for perhaps longer term storage. Since provenance events do not contain any FlowFile content, only provenance events stored locally within a NiFi instance can be used to view or replay any content. Thanks, Matt
... View more
02-21-2017
04:37 PM
@mayki wogno One thing you could do is set "FlowFile Expiration" on the connection containing the "merged" relationship. And set the "Available Prioritizers" to " Newest FlowFileFirstPrioritizer". FlowFile expiration is measured against the age of the FlowFile (from creation time to now) and not how long it has been in a particular connection. If the FlowFile age exceeds this configured value, it is purged from the queue.
... View more
02-21-2017
03:44 PM
1 Kudo
@Andy Liang The ConsumeJMS and PublishJMS processors can be used with IBM MQ. They require you to setup an "JMSConnectionFactoryProvider" controller service to facilitate that IBM MQ connection. You will need to download the IBM MQ Client library on to the server where your NiFi is running. Matt
... View more