Member since
07-30-2019
3131
Posts
1564
Kudos Received
909
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
104 | 01-09-2025 11:14 AM | |
655 | 01-03-2025 05:59 AM | |
393 | 12-13-2024 10:58 AM | |
427 | 12-05-2024 06:38 AM | |
356 | 11-22-2024 05:50 AM |
05-30-2016
11:07 PM
Thank you for your feedback. I will look into creating an article to cover this topic. Perhaps "How to setup my first non-secured NiFi cluster."
... View more
05-30-2016
11:04 PM
PJ, Roles only work with a secured NiFi. The intent of step 5 was to give a backdoor method for properly populating the first needed entry in the authorized-users.xml file. You can either follow step 5 (backdoor method for creating first needed Admin user for https setup) or you can get your DN from your cert or ldap to manually populate that authorized-users.xml.
Bottom line is if you are being told you cannot secure your NiFi and to leave it running unsecure with http, you cannot create user roles. User roles are only used by a secured https configured NiFi. If someone is telling you that you can setup user roles within a http configured NiFi, they are unfortunately misinformed.
An example entry is below should you decide to secure your NiFi so you can make use of this feature in the future:
<users> <userdn="CN=John Doe, OU=MyBusiness, O=MyOrg, L=Baltimore, ST=MD, C=US"> <rolename="ROLE_ADMIN"/> </user> </users> Keep in mind that in order for this to work, the userdn above has to match exactly how it is recorded in the users certificate or within ldap for user John Doe in this example. What this does is authorize the authenticated user "John Doe" with the "Admin" role. A user can be assigned multiple roles as well. An example of how that would look is as follows: <users> <userdn="CN=John Doe, OU=MyBusiness, O=MyOrg, L=Baltimore, ST=MD, C=US"> <rolename="ROLE_ADMIN"/> <rolename="ROLE_DFM"/> <rolename="ROLE_PROVENANCE"/> </user> </users> In that example, the authenticated user "John Doe" has been authorized with "Admin", "Dataflow Manager", and "Provenance" user roles. Hope this adds clarity to my previous response. Thanks, Matt
... View more
05-30-2016
02:22 PM
1 Kudo
Hello, Pierre is correct about what the swap threshold is used for. For speed and efficiency, NiFi hold all the FlowFile attributes associated with each FlowFile in JVM memory. In case where queue develop this can result in considerable memory usage. So NiFi has established a default swapping threshold of 20,000 FlowFiles per connection. What this means is that once a queue reaches 30,000 FlowFiles, 10,000 will be swapped out to disk. The 20,000 that are the next to be worked on based on the connection prioritization scheme are left in memory. NiFi will continue to swap 10,000 FlowFiles at a time to disk as the queue continues to grow. Keep in mind that files swapped out must be swapped back in before they can be worked on by the destination processor. NiFi does not throw away any data unless expiration has been set on connections. As long as you have sufficient disk space to hold the data, it will continue to queue. I suggest reading through this article if you have not already: ( https://community.hortonworks.com/content/kbentry/7882/hdfnifi-best-practices-for-setting-up-a-high-perfo.html ) That being said, there are dangers to allowing your disk to fill to 100% capacity, so as Pierre mentioned you should be setting backpressure throughout your dataflow to trigger upstream processor to stop and eventually stopping pulling in new data.
Thanks, Matt
... View more
05-30-2016
02:10 PM
1 Kudo
Hello, In order to secure access to your NiFi instance/cluster, NiFi must be configured to run securely via https. Section 6 of the guide linked above correctly states that your NiFi must be configured to run securely (HTTPS) and have an authentication mechanism (user certificates. ldap, or kerberos) in place. Without a secure setup, all users who access the NiFi UI get in with anonymous access which gives all of them full access so all aspects of the NiFi UI.
The DN is what uniquely identifies each user and/or server that accesses your NiFi. If using ssl certificates as your authentication mechanism, the DN will be found inside the certificate and would have been generated during the certificates creation phase. There is an article here ( https://community.hortonworks.com/articles/17293/how-to-create-user-generated-keys-for-securing-nif.html ) that walks you through creating your own keystores and truststores for securing your NiFi (https). It also covers creating user certificates if that is the authentication mechanism you choose to use.
You can also use ldap or kerberos as your authentication mechanism, but you will need to setup or use an existing ldap or kerberos infastructure.
Users with the "admin" role have the ability to authenticate in to the secured NiFi UI. From there they can access the user management interface via this icon . This interface will allow users that have the "admin" role to approve the access of other users who have requested through the secured NiFi UI. The reason you need to manually add the first "admin" user is because otherwise you have no users who can access the UI to approve requests.
If you are unsure how to extract your user DN from your configure authentication mechanism, you can do the following: 1. You still need to setup your NiFi securely. You can use the procedure linked above to create the needed keystores and truststores to do so. 2. Configure your nifi.properties file for secure (https) and non-secure (http) access. You will need to use unique ports for each. (8080 for http and 8443 for https for example). 3. Navigate to the https address for your NiFi instance. If you are using user certificates, you will need to have followed that section of the above linked article to create your user key and load it in to your browser. If setup for ldap, provide your ldap username and password when the NiFI UI prompts you. 4. You will then be prompted to request access if you successfully authenticated. This is the authorization request portion. after requesting access the screen will say pending approval. 5. You can now navigate to the non-secure (http) address for your NiFi which lets everyone in as anonymous with full access. Go to the user management UI via the icon shown above and grant your user the "admin" role. You can now go back to the secured NiFi UI address and gain controlled access. Don't forget to go back in your nifi.properties file and remove the http configure to prevent uncontrolled annoymous access at this point.
There are other roles that authorize authenticated users to do different things within the UI: 1. adminstrator: Can add or remove authorized roles for other users. Can purge flow configuration change history. 2. Data Flow Manager: Can build, manipulate, modify, start, stop, and/or delete dataflows on the NiFi canvas. 3. Read Only: Can access UI and view the configuration of items on the canvas, but cannot build, manipulate, modify, start, stop, and/or delete any of it. 4. Provenance: Users with this added role can search any stored provenance data. I am not sure where the confusion came from with regards to setting up controlled access via http, but if you can point me in the right direction I will do my best to get the documentation updated so it is more clear. Thank you, Matt
... View more
05-30-2016
01:18 PM
Are there any WARN/ERROR messages being produced in the nifi-app.log or nifi-bootstrap.log?
... View more
05-27-2016
12:53 PM
Keep in mind that FlowFile Attributes live in memory. Loading a FlowFile Attribute with the entire content of the file is going to have an impact on heap usage in your flow. That being said, there are two things to consider when building dataflows like this:
1. Increasing the the size of the available heap for the NiFi application. Heap space thresholds for NiFi are configured in the bootstrap.conf file and by default are very small (512 MB).
# JVM memory settings java.arg.2=-Xms512m java.arg.3=-Xmx512m 2. You must take in to consideration the data volumes you will be working with in the particular dataflow. To help prevent out of memory error in NiFi, we have established a threshold on how much data can queue on a connection before FlowFile's attributes are swapped out of heap to disk. The default configuration in the nifi.properties file is 20,000. ( nifi.queue.swap.threshold=20000 ) this is per connection not per flow. So if the FlowFiles you extracted content in begin to queue on numerous connections, you run the risk of hitting the out of memory condition quicker. You can decrease this value so swapping happens sooner, but that will in turn have an impact on performance. I would start with increasing the heap memory for your NiFi and the go from there.
... View more
05-27-2016
12:35 PM
While still possible to use multicast, it is very uncommon. Its original intend was so you could setup your NiFi cluster so it could auto-discover the NCM. The idea behind this was that if the NCM died, a new one could quickly be stood-up and the nodes would auto-discover and join that new NCM without needing to be restarted. This multicast setup has been around since clustering in NiFi was first added. This was long before site-to-site capability was added. With Site-to-Site, the ability use multicast for the intend described above is not possible without some unique setups within DNS. The RemoteProcessGroup is very dependent on a specific NCM URL, so having the URL change would break Site-To-Site.
... View more
05-25-2016
10:51 PM
The fact that it was started without any configuration modification will have only one impact. With default configuration, the NiFi instance would have started http as a standalone instance. As a result it would have generated a flow.xml.gz file and a templates directory inside the NiFi conf directory. If the cluster NCM you are joining this node to already has a existing flow or templates, this node will fail to join because they will not match. NO need to reinstall to fix this if that is the case. Simply delete the flow.xml.gz file and the templates directory before starting it again. When it joins the cluster it will get the current flow and templates from the NCM.
... View more
05-25-2016
10:17 PM
That state directory you found only exists because at some point you started your NiFi instance and it was generated by the application. Had this been a fresh install it would not have existed and you would have needed to create yourself to complete the zookeeper setup.
... View more