About MattWho

MattWho · ‎09-06-2016

@Bojan Kostic It is not currently possible to add new jars /nars to a running NiFi. A restart is always required to get these newly added items loaded. Upon NiFi startup all the jars/nars are unpacked in to the NiFi work directory. To maintain high availability it is recommended that you use a NiFi cluster. This will allow you to do rolling restarts so that your entire cluster is not down at the same time. If adding new components as part of this rolling update, you will not be able to use those new components until all nodes have been updated. Thanks, Matt

MattWho · ‎09-06-2016

@David DN Before Site-to-Site (S2S) can be used the following properties must be set in the nifi.properties file on all the Nodes in your NiFi cluster: # Site to Site properties nifi.remote.input.host=<FQDN of Host> <-- Set to resolveable FQDN by all Nodes nifi.remote.input.secure=false <-- Set to True on if NiFi is running HTTPS nifi.remote.input.socket.port=<Port used for S2S) <-- Needs to be set to support Raw/enable S2S nifi.remote.input.http.enabled=true <-- Set if you want to support HTTP transport nifi.remote.input.http.transaction.ttl=30 sec A restart of your NiFi instances will be necessary for this change to take affect. Matt

MattWho · ‎09-02-2016

@INDRANIL ROY Please share how you have your SplitText and RouteText processors configuration. If understand your end goal, you want to take this single files with 10,000,000 entries/lines and route only lines meeting criteria 1 to one putHDFS while route all other lines to another putHDFS? Thanks, Matt

MattWho · ‎08-31-2016

You can also save portions or all of you dataflow in a to NiFi templates that can be exported for use on other NiFi installations. To create a template simply highlight all the components you want in your template (If you highlight a process group, all components within that process group will be added to the template). Then click on the "create template" icon in the upper middle create your template. The Templates manager UI can be used to export and import these templates from your NiFi. It can be access via this icon in the upper right corner of the NiFi UI. *** Note: NiFi templates are sanitized of any sensitive properties values (A sensitive property value would be any value that would be encrypted. In NiFi that would be any passwords) Matt

MattWho · ‎08-31-2016

@Sami Ahmad Every change you make to the NiFi canvas is immediately saved to the flow.xml.gz file. No need to manually initiate a save. Each installation of NiFi provides you with a single UI for building a dataflow. You can build as many different dataflow as you want on this canvas. These different dataflows do not need to be connected in any way. The most common approach to what you are doing is to create a different process group for each of your unique dataflows. To add a new process group to the canvas, drag the process group icon on to the canvas and give it unique name that identifies the dataflow it contains. If you double click on that process group, you will enter it giving you a blank canvas to work with. So here you can see I have two process groups that are not connected in any way. One contains a dataflow that consists of 6 processors while the other has 89. I can right click on either of these process groups and select either start or stop from the context menu. That start or stop action is applied against every processor within that process group. so this gives you an easy way to stop one dataflow and start another. You could even have both running at the same time. Matt

MattWho · ‎08-31-2016

@Sami Ahmad An easy way to return NiFi to a blank canvas, is to simply stop NiFi and remove the flow.xml.gz file from the NiFi's conf directory. When you restart your NiFi a new blank flow.xml.gz file will be generated. Any FlowFiles that had existed in the deleted flow will be purged from NiFi when it is started. Alternatively: The error you are seeing is occurring because you are inside a NiFi process group and trying to delete all the components; however, NiFi has detected that there are connections attached to that process group from the process group one level up. NiFi will not allow those components to be removed until there feeding connections are removed first. If you return to the root/top level of your NiFi dataflow you can select the connection entering and existing a process group and delete them. Once they have been deleted, you can select the process group itself and delete it. This will in turn delete all components inside that process group. The deletion of connections will only be allowed if there is no queued FlowFiles in that connection. If there is queued FlowFiles, the FlowFiles must be purged before the connection can be deleted. Matt

MattWho · ‎08-31-2016

@Sami Ahmad I am not saying it will not work as is; however, without a defined path forward from those two output ports the data will just Queue. Looking at your screenshot, it does look like the dataflow is producing FlowFiles. If you look at the stats on the process groups "log Generator" and "Data Enrichment" you will see data being produced and queued. The problem is that none of the components inside "Data Enrichment" are running. If you double click on the "Data Enrichment" process group, you will be taken inside of it. There you will see the stopped components, the invalid components, and the ~20,000 queued FlowFiles. You will need to start all the valid stopped components in this process group to get your data flowing all the way to your two putHDFS processors outside this process group. There are two output ports in this "Data Enrichment" process group that are invalid. They are not necessary for this tutorial to work. I suggest you stop the "Filter WARN" and "Filter INFO" processors and delete the connections feeding these invalid output ports. If you have already run this flow and data exists queued on these connections you wish to delete, you will need to right click on the connection and select "Empty queue" before you will be able to delete it. These example were not put out by Apache. I will try to find the correct person who wrote this tutorial and see if i can get them to update it. Thanks, Matt

MattWho · ‎08-31-2016

@Sami Ahmad For starters, I have to agree that the "generate_logs.py" script is not being used in that NiFi template anywhere. The NiFi flow itself has been built to generate some fake log data. Invalid components: Components like NiFi processors, input ports, output ports, controller services, and reporting tasks all have minimum defined requirements that must be met before they are in a "valid" state. Only components that are in a Valid state can be started. Floating your cursor over the invalid icon on a component will show why it is not valid. The Data Enrichment process group in this template has two output ports that have no defined connections making the invalid. Despite the warning you were presented with, all valid components should have been started. You can fix the issue by creating the two missing output connections: So here you can see I added a new processor (UpdateAttribute with success checked for auto-terminate) and dragged a connection from the Data Enrichment process group to it twice. Once for each invalid output port it contained (Warn logs and Info logs). Now the process group no longer reflects and invalid components in it. I am unable to see the screenshot you attached. I did start the "Log Generate" process group without making any changes to it and do see data being produced. I see data being queued in several places in the dataflow. If you are not seeing any data queued, check your NiFi's nifi-app.log for errors. Also check the various GenerateFlowFile processor to see if they are producing any bulletins (This icon will be displayed on the processor if it is: ) Floating over the bulletin will display a log message that may indicate the issue. Thanks, Matt

MattWho · ‎08-31-2016

@Sami Ahmad Instead of left clicking on the link in step three, right click and select "Save Link As..." option to save the xml template so it can be imported in to your NiFi. The dataflow template will show you all the components needed for this workflow. I believe the intent of this tutorial was not to teach users how to use the NiFi UI, but rather how to use a combination of specific NiFi components to build accomplish a particular workflow. Using the NIFi UI dataflow tools you can recreate the workflow as a UI dataflow building exercise. Thanks, Matt

MattWho · ‎08-31-2016

@INDRANIL ROY That is the exact approach I suggested in response to the above thread we had going. Each Node will only work on the FlowFile it has in its possession. By splitting this large TB file into many smaller files, you can distribute the processing load across your downstream cluster. The distribution of FlowFiles via the RPG works as follows. The RPG communicates with the NCM of your NiFi cluster. The NCM returns back to the source RPG a list of available Nodes and there S2S ports in its cluster along with the current load on each. It is then the responsibility of the RPG to do smart load-balancing of the data in its incoming queue to these Nodes. Nodes with higher load will get fewer FlowFiles. The load balancing is done in batches for efficiency, so under light load you may not see an exact balanced delivery, but under higher FlowFile volumes you will see a balanced delivery over the 5 minutes delivery statistics. Thanks, Matt

Member Since	‎07-30-2019 10:41 AM
Last Visited
Posts	3,131
Kudos received	1560

Cloudera Community

Re: Flowfile stuck in Wait in EnforceOrder process...

Re: Untrusted proxy error Authentication Failed o....

Re: REST API Configuration for NiFi 2.0

Re: Fileflow penalized for certain time before all...

Re: Nifi : Implement Sleep Mechanism in nifi witho...

Re: Hot swapping the jar in Nifi

Re: 'Remote instance of NiFi is not configured to ...

Re: Load balancing while the fetching of file fro...

Re: multiple Nifi dataflows together

Re: multiple Nifi dataflows together

Re: how to clean a NIFI screen

Re: nifi connector broken

Re: nifi connector broken

Re: issues with NiFi tutorial

Re: Load balancing while the fetching of file fro...