Member since
07-30-2019
3123
Posts
1563
Kudos Received
907
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
246 | 12-13-2024 10:58 AM | |
341 | 12-05-2024 06:38 AM | |
299 | 11-22-2024 05:50 AM | |
241 | 11-19-2024 10:30 AM | |
225 | 11-14-2024 01:03 PM |
09-06-2016
02:03 PM
1 Kudo
@INDRANIL ROY
You have a couple things going on here that are affecting your performance. Based on previous HCC discussions you have a single 50,000,000 line file you are splitting in to 10 files (Each 5,000,000 lines) and then distributing those splits to your NiFi cluster via a RPG (Site-to-Site). You are then using the RouteText processor to read every line of these 5,000,000 line files and route the lines based on two conditions. 1. Most NiFi processors (including RouteText) are multi-thread capable by adding additional concurrent tasks. A single concurrent task can work on a single file or batch of files. Multiple threads will not work on the same file. So by setting your current tasks to 10 on the RouteText you may not actually be using 10. The NiFi controller also has a max number of threads configuration that limits the number of threads available across all components. The max thread setting can be found by clicking on this icon in the upper right corner of the UI. Most components by default use timer driven threads, so this is the number you will want to increase in most cases. Now keep in mind that your hardware also limits how much "work" you can do concurrently. With only 4 cores, you are fairly limited. You may want to up this value from the default 10 to perhaps 20. You can just end up with a lot of threads in cpu wait. Avoid getting carried away on your thread allocations (Both at the controller level and processor level). 2. In oder to get better multi-thread throughput on your RouteText processor, try splitting your incoming fie in to many smaller files. Try splitting your 50,000,000 line file in to files with no more then 10,000 lines each. The resulting 5,000 files will be better distributed across your NiFi cluster Nodes and allow the multiple threads to be utilized. Thanks, Matt
... View more
09-06-2016
12:29 PM
1 Kudo
@Bojan Kostic
It is not currently possible to add new jars /nars to a running NiFi. A restart is always required to get these newly added items loaded. Upon NiFi startup all the jars/nars are unpacked in to the NiFi work directory. To maintain high availability it is recommended that you use a NiFi cluster. This will allow you to do rolling restarts so that your entire cluster is not down at the same time. If adding new components as part of this rolling update, you will not be able to use those new components until all nodes have been updated. Thanks, Matt
... View more
09-06-2016
12:18 PM
2 Kudos
@David DN Before Site-to-Site (S2S) can be used the following properties must be set in the nifi.properties file on all the Nodes in your NiFi cluster: # Site to Site properties
nifi.remote.input.host=<FQDN of Host> <-- Set to resolveable FQDN by all Nodes
nifi.remote.input.secure=false <-- Set to True on if NiFi is running HTTPS
nifi.remote.input.socket.port=<Port used for S2S) <-- Needs to be set to support Raw/enable S2S
nifi.remote.input.http.enabled=true <-- Set if you want to support HTTP transport
nifi.remote.input.http.transaction.ttl=30 sec A restart of your NiFi instances will be necessary for this change to take affect.
Matt
... View more
09-02-2016
02:02 PM
@INDRANIL ROY Please share how you have your SplitText and RouteText processors configuration. If understand your end goal, you want to take this single files with 10,000,000 entries/lines and route only lines meeting criteria 1 to one putHDFS while route all other lines to another putHDFS? Thanks, Matt
... View more
08-31-2016
08:48 PM
You can also save portions or all of you dataflow in a to NiFi templates that can be exported for use on other NiFi installations. To create a template simply highlight all the components you want in your template (If you highlight a process group, all components within that process group will be added to the template). Then click on the "create template" icon in the upper middle create your template. The Templates manager UI can be used to export and import these templates from your NiFi. It can be access via this icon in the upper right corner of the NiFi UI.
*** Note: NiFi templates are sanitized of any sensitive properties values (A sensitive property value would be any value that would be encrypted. In NiFi that would be any passwords)
Matt
... View more
08-31-2016
08:41 PM
1 Kudo
@Sami Ahmad Every change you make to the NiFi canvas is immediately saved to the flow.xml.gz file. No need to manually initiate a save. Each installation of NiFi provides you with a single UI for building a dataflow. You can build as many different dataflow as you want on this canvas. These different dataflows do not need to be connected in any way. The most common approach to what you are doing is to create a different process group for each of your unique dataflows. To add a new process group to the canvas, drag the process group icon on to the canvas and give it unique name that identifies the dataflow it contains. If you double click on that process group, you will enter it giving you a blank canvas to work with.
So here you can see I have two process groups that are not connected in any way. One contains a dataflow that consists of 6 processors while the other has 89. I can right click on either of these process groups and select either start or stop from the context menu. That start or stop action is applied against every processor within that process group. so this gives you an easy way to stop one dataflow and start another. You could even have both running at the same time. Matt
... View more
08-31-2016
06:24 PM
1 Kudo
@Sami Ahmad An easy way to return NiFi to a blank canvas, is to simply stop NiFi and remove the flow.xml.gz file from the NiFi's conf directory. When you restart your NiFi a new blank flow.xml.gz file will be generated. Any FlowFiles that had existed in the deleted flow will be purged from NiFi when it is started. Alternatively: The error you are seeing is occurring because you are inside a NiFi process group and trying to delete all the components; however, NiFi has detected that there are connections attached to that process group from the process group one level up. NiFi will not allow those components to be removed until there feeding connections are removed first. If you return to the root/top level of your NiFi dataflow you can select the connection entering and existing a process group and delete them. Once they have been deleted, you can select the process group itself and delete it. This will in turn delete all components inside that process group. The deletion of connections will only be allowed if there is no queued FlowFiles in that connection. If there is queued FlowFiles, the FlowFiles must be purged before the connection can be deleted. Matt
... View more
08-31-2016
05:51 PM
1 Kudo
@Sami Ahmad I am not saying it will not work as is; however, without a defined path forward from those two output ports the data will just Queue.
Looking at your screenshot, it does look like the dataflow is producing FlowFiles. If you look at the stats on the process groups "log Generator" and "Data Enrichment" you will see data being produced and queued. The problem is that none of the components inside "Data Enrichment" are running. If you double click on the "Data Enrichment" process group, you will be taken inside of it. There you will see the stopped components, the invalid components, and the ~20,000 queued FlowFiles.
You will need to start all the valid stopped components in this process group to get your data flowing all the way to your two putHDFS processors outside this process group.
There are two output ports in this "Data Enrichment" process group that are invalid. They are not necessary for this tutorial to work. I suggest you stop the "Filter WARN" and "Filter INFO" processors and delete the connections feeding these invalid output ports. If you have already run this flow and data exists queued on these connections you wish to delete, you will need to right click on the connection and select "Empty queue" before you will be able to delete it. These example were not put out by Apache. I will try to find the correct person who wrote this tutorial and see if i can get them to update it. Thanks, Matt
... View more
08-31-2016
04:56 PM
1 Kudo
@Sami Ahmad For starters, I have to agree that the "generate_logs.py" script is not being used in that NiFi template anywhere. The NiFi flow itself has been built to generate some fake log data. Invalid components: Components like NiFi processors, input ports, output ports, controller services, and reporting tasks all have minimum defined requirements that must be met before they are in a "valid" state. Only components that are in a Valid state can be started. Floating your cursor over the invalid icon on a component will show why it is not valid. The Data Enrichment process group in this template has two output ports that have no defined connections making the invalid. Despite the warning you were presented with, all valid components should have been started. You can fix the issue by creating the two missing output connections:
So here you can see I added a new processor (UpdateAttribute with success checked for auto-terminate) and dragged a connection from the Data Enrichment process group to it twice. Once for each invalid output port it contained (Warn logs and Info logs). Now the process group no longer reflects and invalid components in it. I am unable to see the screenshot you attached. I did start the "Log Generate" process group without making any changes to it and do see data being produced. I see data being queued in several places in the dataflow. If you are not seeing any data queued, check your NiFi's nifi-app.log for errors. Also check the various GenerateFlowFile processor to see if they are producing any bulletins (This icon will be displayed on the processor if it is: ) Floating over the bulletin will display a log message that may indicate the issue. Thanks, Matt
... View more