Member since
07-30-2019
3387
Posts
1617
Kudos Received
999
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 108 | 11-05-2025 11:01 AM | |
| 368 | 10-20-2025 06:29 AM | |
| 508 | 10-10-2025 08:03 AM | |
| 351 | 10-08-2025 10:52 AM | |
| 391 | 10-08-2025 10:36 AM |
10-27-2016
12:45 PM
2 Kudos
@Joshua Adeleke First lets make sure we are on the same page terminology wise.... NiFi FlowFiles --> FlowFiles are made up of two parts, FlowFile Content and FlowFile Attributes. FlowFiles Content is written to NiFI's content Repository while FlowFile Attributes mostly live in JVM heap memory and the NiFi FlowFile repository. It is the FlowFile attributes that move from processor to processor in your dataflow. Apache NiFi does not have a version 1.0.0.2.0.0.0-579. That is an HDF version of Apache NiFi 1.0.0. If you are migrating to HDF 2.0, I suggest instead migrating to HDF 2.0.1. It has many bug fixes you will want... https://docs.hortonworks.com/HDPDocuments/HDF2/HDF-2.0.1/index.html When you say you want to move the "dataflow process: to another server, are you talking about your entire canvas and all configured processors only? Or are you also talking about moving any active FlowFiles within the old flow to the new? My suggestion would be to stand-up your new NiFi instance/cluster. You can copy your flow.xml.gz file from your old NiFi to the new NiFi. ***NOTE: you need to make sure that the same sensitive props key is used (config setting for this is found in nifi.properties file) or your new NiFi will not start because it will not be able to decrypt any of your sensitive properties in that file. If your old NiFi was secured, you can use its authorized-users.xml file to establish the initial admin authorities in the new NiFi (configure this in authorizers.xml file). Once started you will need to access this new NiFi's UI and address any "invalid" processor/controller services, and add any controller services you may have had running on the NCM only in the old version. (No NCM in HDF 2.x versions). Some of these may be invalid because of changes to the processor/controller service properties. Once all has been addressed you are ready to move to the next step. Shutdown all ingest processor on your old NiFi and allow it to finish processing out any data it is already working on. At the same time you can start your new NiFi NiFi so it starts ingesting any new data and begins processing it. Thanks, Matt
... View more
10-26-2016
07:33 PM
@Saikrishna Tarapareddy You can actually cut the ExtractText processor out of this flow. I forgot the RouteText processor generates a "RouteText.Group" FlowFile attribute. You can just use that attribute as the "Correlation Attribute Name" in the MergeContent processor.
... View more
10-26-2016
07:10 PM
1 Kudo
@Saikrishna Tarapareddy I agree that you may still need to split your very large incoming FlowFile into smaller FlowFiles to better manage heap memory usage, but you should be able to use the RouteText and ExtractText as follows to accomplish what you want:
RouteText configured as follows: All Grouped lines will be routed to relationship "TagName" as a new FlowFile. They feed into an ExtractText configured as follows: This will extract the TagName as an attribute of on the FlowFile which you can then use as the correlationAttribute name in the MergeContent processor that follows. Thanks, Matt
... View more
10-26-2016
05:46 PM
@Zack Riesland There are seven fields; however, the seventh field is optional. So you are correct. so both " 0 0 18 * * ? " and " 0 0 18 * * ? * " are valid. The below is from http://www.quartz-scheduler.org/documentation/quartz-2.x/tutorials/crontrigger.html ---------------- * (“all values”) - used to select all values within a field. For example, “” in the minute field means *“every minute”. ? (“no specific value”) - useful when you need to specify something in one of the two fields in which the character is allowed, but not the other. For example, if I want my trigger to fire on a particular day of the month (say, the 10th), but don’t care what day of the week that happens to be, I would put “10” in the day-of-month field, and “?” in the day-of-week field. See the examples below for clarification. ----------------- so only fields 4 and 6 will accept ?. Thanks, Matt
... View more
10-26-2016
05:31 PM
Same hold true for the MergeContent side of this flow. Have a MergeContent merger the first 10,000 FlowFiles and a second merger multiple 10,000 line FlowFiles into even larger merged FlowFiles. This again will help prevent running in to OOM errors.
... View more
10-26-2016
04:47 PM
1 Kudo
@Saikrishna Tarapareddy You may consider using the RouteText processor to route the individual lines from your source FlowFile to relationships based upon your various Tagnames and then use mergeContent processors to merger those lines back in to a single FlowFile.
... View more
10-26-2016
02:17 PM
Is user2@domain.net part of your "Admin NiFi" user group?
Did you grant "Admin Group" the "modify the data" policy? You can set DEBUG in you logback.xml file for the following line to get more output in your nifi-users.log: <logger name="org.apache.nifi.web.api.config" level="INFO" additivity="false"> No nifi restarts are needed for any changes to the logback.xml file to take affect. Matt
... View more
10-26-2016
12:59 PM
The Quartz scheduler has 7 fields, so the cron would need to be 0 0 18 * * ? *. The seventh field is optional for year. Yes the cron you have there will run the 18th hour of every day.
... View more
10-26-2016
12:58 PM
@Paul Yang What you have here is very light data flow based on the picture shown. The NiFi RPG will send data in batches of up to 100 for efficiency. So if the input queue has less then 100 files in it when it runs, all of those FlowFile will be routed to a single Node. On next run the next batch would go to a different node. Over time if the dataflow rate is constant, the data should be balanced across your nodes. If i am understanding what you have here, you are feeding the RPG that feeds an input port. That input port feeds an output port. Then you can use various RPGs anywhere in your flow to pull data from that output port. correct? The problem with this is that the RPG runs on every Node. so when a node connects he will try to pull all the files he sees on that connection. Nodes are not aware of how many nodes exist in its cluster and will not say I should only pull x amount so the other nodes can pull the same. Each node acts in a a vacuum and pulls as much data as fast as it can from the output port. I would suggest instead having your remote input port (root level input port) feed its success relationship multiple times in the various sub process groups owned by your various departments. Not only will this provide a better load-balanced delivery of data in the cluster, but it will also improve performance. Thanks, Matt
... View more
10-26-2016
12:33 PM
If after adding "modify the data" policy it still does not work, check the nifi-user.log to see what entity it is having permissions problems with? Did you set processor level policies on the processors on each side of this queued connection?
... View more