About MattWho

MattWho · ‎10-12-2016

@Ankit Jain When a secure NiFi is started for the first time, a users.xml and authorizations.xml file is generated. The users.xml that is created will have your users added to it using the provided DN form your authorizers.xml file: Initial Admin Identity Node Identity 1 Node Identity 2 Node Identity 3 Node Identity 4 etc... Each of those "users" are assigned a UUID which is then used to set some required policies in the authorizations.xml file in order to be able to access the NiFi UI. At a minimum, all "Node Identity" DN's UUIDs need to be assigned to the /proxy resource (Policy) and /flow (read/R )resource inside that file. You "Initial Admin" DN should have /flow (READ/R and Write/W) and /policies (R and W). If NiFi was secured and started prior to some or all of the above DNs being set in the authorizers.xml, the users.xml and authorizations.xml files will be created without any entries. Updates to these DN properties in the authorizers.xml file later will not cause updated to occur two these files. If you find this is what occurred in your case, you can stop your NiFi nodes, deleted both the users.xml and authorizations.xml files from all nodes and restart. On restart NiFi will again generate these files since they do not exist using the DNs in your authorizers.xml file on each node. Thanks, Matt

MattWho · ‎10-12-2016

@Jobin George When you add new components (Process groups or processors), they inherit the policies from the parent component by default. This means the your process group (Group1) has inherited some policies maybe from its parent process group and your processor (getSFTP) has inherited policies from the process group it is inside. My guess is that those inherited policies are allowing user "john" view and "modify" to process group 'Group1'. When you select a component (process group or processor) and click on the key icon to modify/set its policies, you may notice the following in the "Access Policies" UI that is displayed: This line is telling you that the policies you are currently looking at are coming from a parent process group. If you modify any of these policies, what you are really doing is modifying the policies on that parent process group rather then on the actual selected component. In order to set specific policies for the select component, you must fist click on "Override". You will then see the above effective policy line go away and the specific policy you are currently looking at will be cleared of all entries. Now you can add specific users for this policy that are applied to only tis component. If the component is a process group, any processor or additional process group within will inherit this new policy. Keep in mind that every policy inherits from its parent by default, so clicking on "Override" only create a new policy accesses for that one policy. You will need to select each available policy for a component and click "Override" for each one where you want to set component specific policy accesses. Thanks, Matt

MattWho · ‎10-11-2016

@Saikrishna Tarapareddy Almost... NiFi stores FlowFile content in claims. A claim can contain 1 to many FlowFile's content. Claims allow NiFi to use large disk more efficiently when dealing with small content files. These claims will only be moved in to the archive directory once every FlowFile associated to that claim has beed auto-terminated in the dataflow(s). Also keep in mind that you can have multiple FlowFiles pointing at the same content (This happens for example when you connect the same relationship multiple times from a processor). Let say you routed a success relationship twice off of an updateAttribute processor. NiFi does not replicate the content, but rather create another FlowFile that points at that same content. So both those FlowFiles now need to reach an auto-termination point before that content claim would be moved to archive. The content claims are defined in the nifi.properties file: nifi.content.claim.max.appendable.size=10 MB nifi.content.claim.max.flow.files=100 The above are the defaults. If a file comes in at less then 10 MB in size, NIFi will try to append it to the next file(s) unless the combination of those files were to exceed the 10 MB max or the claim has already reach 100 files. If a file comes in that is larger then 10 MB it ends up in a claim all by itself. Thanks, Matt

MattWho · ‎10-11-2016

@Saikrishna Tarapareddy The retention settings in the nifi.properties file are for NiFi data archive only. They do not apply to files that are active (queued or still being processed) in any of your dataflows. NiFi will allow you to continue to queue data in your dataflow all the way up to the point where your content repository disk is 100% utilized. That is why backpressure on dataflow connections throughout your dataflow is important to control the amount of FlowFiles that can be queued. Also important to isolate the content repository from other NiFi repositories so if it fills the disk, it does not cause corruption of those other repositories. If content repository archiving is enabled nifi.content.repository.archive.enabled=true then the retention and usage percentage settings in the nifi.properties file take affect. NiFi will archive FlowFiles once they are auto-terminated at the end of a dataflow. Data active your dataflow will always take priority over archived data. If your dataflow should queue to the point your content repository disk is full, the archive will be empty. The purpose of archiving data is to allow users to replay data from any point in the dataflow or be able to download and examine the content of a FlowFile post processing through a dataflow via the NiFi provenance UI. For many this is a valuable feature and to other not so important. If is not important for your org to archive any data, you can simply set archive enabled to false. FlowFiles that are not processed successfully within your dataflow are routed to failure relationships. As long as you do not auto-terminate any of your failure relationships, the FlowFiles remain active/queued in your dataflow. You can then build some failure handling dataflow if you like to make sure you do not lose that data. Matt

MattWho · ‎10-10-2016

@Saikrishna Tarapareddy Since RAID 1 requires a minimum of 2 disks and RAID 10 requires a minimum of 4 disks. You can build either: a. (2) RAID 10 b. (2) RAID 1 and (1) RAID 10 or c. (4) RAID 1 My recommendation for you would be to provision your (8) 600GB disks as follows: - Provision your 8 disks in to (4) RAID 1 (2 disks: 600 GB + 600 GB mirrored (Total capacity 600 GB)) configurations. -------------- (1) RAID 1 (~600 GB capacity) with the following mounted logical volumes: 100 - 150 GB --> /var/log/nifi 100 GB --> /opt/nifi/flowfile_repo 50 GB --> /opt/nifi/database_repo remainder --> / (1) RAID 1 (~600 GB capacity) with the following mounted logical volumes: Entire RAID as single logical volume --> /opt/nifi/provenance_repo (1) RAID 1 (~600 GB capacity) with the following mounted logical volumes: Entire RAID as single logical volume --> /opt/nifi/content_repo1 (1) RAID 1 (~600 GB capacity) with the following mounted logical volumes: Entire RAID as single logical volume --> /opt/nifi/content_repo2 --------------- The above will give you ~1.2TB of content_repo storage and ~600GB of Provenance history storage. If provenance history is not as important to you, you could carve off another logical volume on the first RAID 1 for your provenance_repo and allocate all (3) remaining RAID 1 for content repositories. *** Note: NIFi can be configured to use multiple content repositories in the nifi.properties file: nifi.content.repository.directory.default=/opt/nifi/content_repo1/content_repository <-- This line exists already nifi.content.repository.directory.repo2=/opt/nifi/content_repo2/content_repository <-- This line would be manually added. nifi.content.repository.directory.repo3=/opt/nifi/content_repo3/content_repository <-- This line would be manually added. *** NiFi will do file based striping across all content repos. Thanks, Matt

MattWho · ‎10-07-2016

For other users/readers who do not know, HDF 2.0 includes as part of the release includes the following: GetKafka and PutKafka --> Support Kafka 0.8 ConsumeKafka and PublishKafka --> Supports Kafka 0.9 ConsumeKafka_0_10 and PublishKafka_0_10 --> Supports Kafka 0.10 Thanks, Matt

MattWho · ‎10-07-2016

@Ramil Akhmadeev HDP 2.5 comes with Apache Kafka 0.10.0.1. The NiFi getKafka processor uses the Kafka 0.8 client library. For communicating with Kafka 0.10 you should be using the consumeKafka_0_10 NiFi processor.

MattWho · ‎10-04-2016

@Ankit Jain Let me make sure I understand your flow completely. - You have 4 consumeKafka processors all reading from the same topic? If this is your intent, You should have a single consume Kafka processor with the success relationship drawn off of it 4 times (one to each unique putHDFS processor). This cuts down on disk I/O since the consumed data is only written to the NiFi content repository once. - Then you are trying to write that same data to 4 different HDFS endpoints? With only 3 partitions on your Kafka, you can only have three consumers at a time. With 4 nodes in your cluster, one of the nodes at any given time will not be consuming any data. Optimally the number of partitions would be equal to or multiples of the number of nodes in your NiFi cluster. (For example with 4 partitions, you would have 4 nodes with the consumeKafka processor running with 1 concurrent task. With 8 partitions, you would have 4 nodes with the consumeKafka processor 2 concurrent tasks.) Would be interesting to know more about your custom Kafka processor and how it differs from the "Max Poll Records" property in the existing consumeKafka processor. Redistributing data across your cluster is only necessary when dealing with ingest type processors that are not cluster friendly such as getSFTP, listSFTP, GetFTP, etc.... With ConsumeKafka, the most optimized approach is as I described above. Your question about how do I know if all files were consumed form topic... A Kafka topic is typically a living thing with more and more files written and removed from it. Not sure How NiFi would know when all files are consumed. NiFi will just continue to poll the topic for new files. If there is nothing new, NiFi gets nothing. It is the Kafka server that keeps track of what files were served up to a consumer, NiFi does not keep a listing itself. Data is not passed to the success relationship until it is consumed completely successfully. NiFi provenance could be used to track particular files or list all FlowFiles created by a consumeKafka processor, but you would need to know how many files were on the topic, NiFi will not know that. Matt

MattWho · ‎10-03-2016

@vnandigam There are two parts to successfully accessing the NiFi UI, Authentication and Authorization. Since you are getting the insufficient permissions screen, you have successfully authenticated. First you should confirm the DN pattern of this user that has successfully authenticated. If you tail the nifi-user.log while you access your NiFi's UI, you will see a line similar to the following: 2016-10-03 11:47:15,134 INFO [NiFi Web Server-65795] o.a.n.w.s.NiFiAuthenticationFilter Authentication success for CN=nifiadmin,OU=hortonworks Examine the DN presented. Does it match exactly what you had in your "Initial Admin Identity" property you set? Next you will want to confirm that this user was properly added to the users.xml file: <user identifier="9d7b4fe2-8e8b-30a5-8e2a-f6a6a18addfa" identity="CN=nifiadmin,OU=hortonworks"/> The user if it exists will be assigned a UUID (The above UUID is just an example and yours will be different.) Next, verify this user was given the ability to "view the user interface" by examining the authorizations.xml file. Within this file you would expect to see the user's UUID above assigned to one or more policies. In order to even see the UI, users must have the "R" to the "/flow" policy: <policy identifier="6a57bf03-2a93-39d0-87dd-e3aa30f0cd4d" resource="/flow" action="R"> <user identifier="9d7b4fe2-8e8b-30a5-8e2a-f6a6a18addfa"/> </policy> In order to be able to add users to additional access policies, the user would also need "R" and "W" to the "/policies" policy (You can think of this as the Global Admin policy): <policy identifier="9a3a1c92-fa10-3f9d-b2f7-5cd56cd2ca00" resource="/policies" action="R"> <user identifier="9d7b4fe2-8e8b-30a5-8e2a-f6a6a18addfa"/> </policy> <policy identifier="1ff611dd-1536-31f5-a610-64e192e4c43c" resource="/policies" action="W"> <user identifier="9d7b4fe2-8e8b-30a5-8e2a-f6a6a18addfa"/> </policy> If you user has both of the above, you should be able to access the UI and use the interface to grant additional users access and add additional levels of access for yourself and/or any user you added. The following policies are what gives a user the ability to create, modify, and delete new users and/or groups: <policy identifier="dee16f9e-1f09-37ee-806b-e372f1051816" resource="/tenants" action="R"> <user identifier="9d7b4fe2-8e8b-30a5-8e2a-f6a6a18addfa"/> </policy> <policy identifier="69839728-eaf3-345d-849f-e2790cf236ab" resource="/tenants" action="W"> <user identifier="9d7b4fe2-8e8b-30a5-8e2a-f6a6a18addfa"/> </policy> If you find that your authorizations.xml file was empty (Had no policies set in it), it is likely your NiFi had been started previous to you setting the "Initial Admin Identity" property. This Property ONLY works the first time NiFi is started. If the authorizations.xml file was already generated, it will not be re-generated or updated on later starts of NiFi. To correct this, you can delete the authorizations.xml file and restart your NiFi. Since it does not exist this time, the "Initial Admin Identity" user will be created this time. ***Note, if other users already have granted authorizations in this file, those will be lost and will need to be re-created. Only delete the authorizations.xml file if wishing to start over from scratch. Thanks, Matt

MattWho · ‎09-30-2016

@Timothy Spann Looks like you do not have enough file handles The following command will show your current open file limits: # ulimit -a This should be min 10000, but may need to be even higher depending on the dataflow. Matt

Member Since	‎07-30-2019 10:41 AM
Last Visited
Posts	3,136
Kudos received	1561

Cloudera Community

Re: MergeRecord generates multiple files

Re: Flowfile stuck in Wait in EnforceOrder process...

Re: Untrusted proxy error Authentication Failed o....

Re: REST API Configuration for NiFi 2.0

Re: Fileflow penalized for certain time before all...

Re: Getting untrusted proxy message while trying t...

Re: Component level Access in NiFi

Re: NiFI Server Configuration

Re: NiFI Server Configuration

Re: NiFI Server Configuration

Re: getkafka don't working in nifi

Re: getkafka don't working in nifi

Re: Best way to check nifi cluster performance.

Re: Nifi UI forbidden

Re: NIFI 1.0 can't empty queues