About MattWho

MattWho · ‎06-02-2016

There are a few things you can do here if i am understanding correctly what you are trying to accomplish. 1. The logback.xml can be modified so specific processor component logs could be redirected to a specific new log file. You can specify where that new log is written. You could also specify the specific log level of those components (WARN level would get you just WARN and ERROR messages). 2. In your dataflow you could use the TailFile processor to monitor that new log and route any generated FlowFiles to a putEmail processor to send them to your Admin. In addition to email you can route those FlowFiles to a processor of your choice to put a copy to a specific location as well either locally or remotely. Thanks, Matt

MattWho · ‎05-31-2016

Ahmad, The line you are seeing in the nifi-bootstrap.log indicates the JVM started successfully. You need to check the nifi-app.log to make sure the application loaded successfully. In the nifi-app.log you will find the following lines if the application successfully loaded: 2016-05-31 10:46:44,347 INFO [main] org.apache.nifi.web.server.JettyServer NiFi has started. The UI is available at the following URLs: 2016-05-31 10:46:44,347 INFO [main] org.apache.nifi.web.server.JettyServer http://<someaddress or FQDN>:8088/nifi Verify that the hostname or IP displayed on this line is reachable/resolvable on the system you are running your web browser from. Thanks, Matt

MattWho · ‎05-30-2016

Thank you for your feedback. I will look into creating an article to cover this topic. Perhaps "How to setup my first non-secured NiFi cluster."

MattWho · ‎05-30-2016

PJ, Roles only work with a secured NiFi. The intent of step 5 was to give a backdoor method for properly populating the first needed entry in the authorized-users.xml file. You can either follow step 5 (backdoor method for creating first needed Admin user for https setup) or you can get your DN from your cert or ldap to manually populate that authorized-users.xml. Bottom line is if you are being told you cannot secure your NiFi and to leave it running unsecure with http, you cannot create user roles. User roles are only used by a secured https configured NiFi. If someone is telling you that you can setup user roles within a http configured NiFi, they are unfortunately misinformed. An example entry is below should you decide to secure your NiFi so you can make use of this feature in the future: <users> <userdn="CN=John Doe, OU=MyBusiness, O=MyOrg, L=Baltimore, ST=MD, C=US"> <rolename="ROLE_ADMIN"/> </user> </users> Keep in mind that in order for this to work, the userdn above has to match exactly how it is recorded in the users certificate or within ldap for user John Doe in this example. What this does is authorize the authenticated user "John Doe" with the "Admin" role. A user can be assigned multiple roles as well. An example of how that would look is as follows: <users> <userdn="CN=John Doe, OU=MyBusiness, O=MyOrg, L=Baltimore, ST=MD, C=US"> <rolename="ROLE_ADMIN"/> <rolename="ROLE_DFM"/> <rolename="ROLE_PROVENANCE"/> </user> </users> In that example, the authenticated user "John Doe" has been authorized with "Admin", "Dataflow Manager", and "Provenance" user roles. Hope this adds clarity to my previous response. Thanks, Matt

MattWho · ‎05-30-2016

Hello, Pierre is correct about what the swap threshold is used for. For speed and efficiency, NiFi hold all the FlowFile attributes associated with each FlowFile in JVM memory. In case where queue develop this can result in considerable memory usage. So NiFi has established a default swapping threshold of 20,000 FlowFiles per connection. What this means is that once a queue reaches 30,000 FlowFiles, 10,000 will be swapped out to disk. The 20,000 that are the next to be worked on based on the connection prioritization scheme are left in memory. NiFi will continue to swap 10,000 FlowFiles at a time to disk as the queue continues to grow. Keep in mind that files swapped out must be swapped back in before they can be worked on by the destination processor. NiFi does not throw away any data unless expiration has been set on connections. As long as you have sufficient disk space to hold the data, it will continue to queue. I suggest reading through this article if you have not already: ( https://community.hortonworks.com/content/kbentry/7882/hdfnifi-best-practices-for-setting-up-a-high-perfo.html ) That being said, there are dangers to allowing your disk to fill to 100% capacity, so as Pierre mentioned you should be setting backpressure throughout your dataflow to trigger upstream processor to stop and eventually stopping pulling in new data. Thanks, Matt

MattWho · ‎05-30-2016

Hello, In order to secure access to your NiFi instance/cluster, NiFi must be configured to run securely via https. Section 6 of the guide linked above correctly states that your NiFi must be configured to run securely (HTTPS) and have an authentication mechanism (user certificates. ldap, or kerberos) in place. Without a secure setup, all users who access the NiFi UI get in with anonymous access which gives all of them full access so all aspects of the NiFi UI. The DN is what uniquely identifies each user and/or server that accesses your NiFi. If using ssl certificates as your authentication mechanism, the DN will be found inside the certificate and would have been generated during the certificates creation phase. There is an article here ( https://community.hortonworks.com/articles/17293/how-to-create-user-generated-keys-for-securing-nif.html ) that walks you through creating your own keystores and truststores for securing your NiFi (https). It also covers creating user certificates if that is the authentication mechanism you choose to use. You can also use ldap or kerberos as your authentication mechanism, but you will need to setup or use an existing ldap or kerberos infastructure. Users with the "admin" role have the ability to authenticate in to the secured NiFi UI. From there they can access the user management interface via this icon . This interface will allow users that have the "admin" role to approve the access of other users who have requested through the secured NiFi UI. The reason you need to manually add the first "admin" user is because otherwise you have no users who can access the UI to approve requests. If you are unsure how to extract your user DN from your configure authentication mechanism, you can do the following: 1. You still need to setup your NiFi securely. You can use the procedure linked above to create the needed keystores and truststores to do so. 2. Configure your nifi.properties file for secure (https) and non-secure (http) access. You will need to use unique ports for each. (8080 for http and 8443 for https for example). 3. Navigate to the https address for your NiFi instance. If you are using user certificates, you will need to have followed that section of the above linked article to create your user key and load it in to your browser. If setup for ldap, provide your ldap username and password when the NiFI UI prompts you. 4. You will then be prompted to request access if you successfully authenticated. This is the authorization request portion. after requesting access the screen will say pending approval. 5. You can now navigate to the non-secure (http) address for your NiFi which lets everyone in as anonymous with full access. Go to the user management UI via the icon shown above and grant your user the "admin" role. You can now go back to the secured NiFi UI address and gain controlled access. Don't forget to go back in your nifi.properties file and remove the http configure to prevent uncontrolled annoymous access at this point. There are other roles that authorize authenticated users to do different things within the UI: 1. adminstrator: Can add or remove authorized roles for other users. Can purge flow configuration change history. 2. Data Flow Manager: Can build, manipulate, modify, start, stop, and/or delete dataflows on the NiFi canvas. 3. Read Only: Can access UI and view the configuration of items on the canvas, but cannot build, manipulate, modify, start, stop, and/or delete any of it. 4. Provenance: Users with this added role can search any stored provenance data. I am not sure where the confusion came from with regards to setting up controlled access via http, but if you can point me in the right direction I will do my best to get the documentation updated so it is more clear. Thank you, Matt

MattWho · ‎05-30-2016

Is your NiFi configured to run http or https?

MattWho · ‎05-30-2016

Are there any WARN/ERROR messages being produced in the nifi-app.log or nifi-bootstrap.log?

MattWho · ‎05-27-2016

Keep in mind that FlowFile Attributes live in memory. Loading a FlowFile Attribute with the entire content of the file is going to have an impact on heap usage in your flow. That being said, there are two things to consider when building dataflows like this: 1. Increasing the the size of the available heap for the NiFi application. Heap space thresholds for NiFi are configured in the bootstrap.conf file and by default are very small (512 MB). # JVM memory settings java.arg.2=-Xms512m java.arg.3=-Xmx512m 2. You must take in to consideration the data volumes you will be working with in the particular dataflow. To help prevent out of memory error in NiFi, we have established a threshold on how much data can queue on a connection before FlowFile's attributes are swapped out of heap to disk. The default configuration in the nifi.properties file is 20,000. ( nifi.queue.swap.threshold=20000 ) this is per connection not per flow. So if the FlowFiles you extracted content in begin to queue on numerous connections, you run the risk of hitting the out of memory condition quicker. You can decrease this value so swapping happens sooner, but that will in turn have an impact on performance. I would start with increasing the heap memory for your NiFi and the go from there.

MattWho · ‎05-27-2016

While still possible to use multicast, it is very uncommon. Its original intend was so you could setup your NiFi cluster so it could auto-discover the NCM. The idea behind this was that if the NCM died, a new one could quickly be stood-up and the nodes would auto-discover and join that new NCM without needing to be restarted. This multicast setup has been around since clustering in NiFi was first added. This was long before site-to-site capability was added. With Site-to-Site, the ability use multicast for the intend described above is not possible without some unique setups within DNS. The RemoteProcessGroup is very dependent on a specific NCM URL, so having the URL change would break Site-To-Site.

Member Since	‎07-30-2019 10:41 AM
Last Visited
Posts	3,133
Kudos received	1560

Cloudera Community

Re: Flowfile stuck in Wait in EnforceOrder process...

Re: Untrusted proxy error Authentication Failed o....

Re: REST API Configuration for NiFi 2.0

Re: Fileflow penalized for certain time before all...

Re: Nifi : Implement Sleep Mechanism in nifi witho...

Re: how to store failure as logs in a particular d...

Re: Nifi UI not launching

Re: Controlling level of access in NiFi question

Re: Controlling level of access in NiFi question

Re: NiFi queue how much time will hold the data, i...

Re: Controlling level of access in NiFi question

Re: Not able to access NiFi web UI

Re: Not able to access NiFi web UI

Re: How Extract text from a multiline flow and cre...

Re: In 'Cluster Node Properties' section of Nifi.p...