Member since
07-30-2019
3133
Posts
1564
Kudos Received
909
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
129 | 01-09-2025 11:14 AM | |
800 | 01-03-2025 05:59 AM | |
415 | 12-13-2024 10:58 AM | |
453 | 12-05-2024 06:38 AM | |
368 | 11-22-2024 05:50 AM |
10-26-2016
05:31 PM
Same hold true for the MergeContent side of this flow. Have a MergeContent merger the first 10,000 FlowFiles and a second merger multiple 10,000 line FlowFiles into even larger merged FlowFiles. This again will help prevent running in to OOM errors.
... View more
10-26-2016
04:47 PM
1 Kudo
@Saikrishna Tarapareddy You may consider using the RouteText processor to route the individual lines from your source FlowFile to relationships based upon your various Tagnames and then use mergeContent processors to merger those lines back in to a single FlowFile.
... View more
10-26-2016
02:17 PM
Is user2@domain.net part of your "Admin NiFi" user group?
Did you grant "Admin Group" the "modify the data" policy? You can set DEBUG in you logback.xml file for the following line to get more output in your nifi-users.log: <logger name="org.apache.nifi.web.api.config" level="INFO" additivity="false"> No nifi restarts are needed for any changes to the logback.xml file to take affect. Matt
... View more
10-26-2016
12:59 PM
The Quartz scheduler has 7 fields, so the cron would need to be 0 0 18 * * ? *. The seventh field is optional for year. Yes the cron you have there will run the 18th hour of every day.
... View more
10-26-2016
12:58 PM
@Paul Yang What you have here is very light data flow based on the picture shown. The NiFi RPG will send data in batches of up to 100 for efficiency. So if the input queue has less then 100 files in it when it runs, all of those FlowFile will be routed to a single Node. On next run the next batch would go to a different node. Over time if the dataflow rate is constant, the data should be balanced across your nodes. If i am understanding what you have here, you are feeding the RPG that feeds an input port. That input port feeds an output port. Then you can use various RPGs anywhere in your flow to pull data from that output port. correct? The problem with this is that the RPG runs on every Node. so when a node connects he will try to pull all the files he sees on that connection. Nodes are not aware of how many nodes exist in its cluster and will not say I should only pull x amount so the other nodes can pull the same. Each node acts in a a vacuum and pulls as much data as fast as it can from the output port. I would suggest instead having your remote input port (root level input port) feed its success relationship multiple times in the various sub process groups owned by your various departments. Not only will this provide a better load-balanced delivery of data in the cluster, but it will also improve performance. Thanks, Matt
... View more
10-26-2016
12:33 PM
If after adding "modify the data" policy it still does not work, check the nifi-user.log to see what entity it is having permissions problems with? Did you set processor level policies on the processors on each side of this queued connection?
... View more
10-26-2016
12:31 PM
@mayki wogno view the data will give you the ability to list the queue, but will not give you the ability to empty the queue. You need to give yoru nodes and the user making teh request teh ability to "modify teh data" as well.
... View more
10-26-2016
12:22 PM
NiFi does not use a linux cron. It uses a Quartz cron/scheduler.
... View more
10-26-2016
12:18 PM
6 Kudos
@Zack Riesland The Cron you have there should run 18 minutes and 1 second into every hour. What you really are looking for as a cron here for 6:01pm is ( 0 1 18 * * ? * or 0 1 18 * * ? ) CRON driven: When using the CRON driven scheduling mode, the Processor is scheduled to run periodically, similar to the
Timer driven scheduling mode. However, the CRON driven mode provides significantly more flexibility at the expense of
increasing the complexity of the configuration. This value is made up of seven fields (where the seventh field is optional), each separated by a space. These
fields include:
Seconds Minutes Hours Day of Month Month Day of Week Year The value for each of these fields should be a number, range, or
increment.
Range here refers to a syntax of <number>-<number>.
For example,the Seconds field could be set to 0-30, meaning that the
Processor should only be scheduled if the time is 0 to 30 seconds
after the minute. Additionally, a value of * indicates that all values are valid for this field. Multiple values can also
be entered using a , as a separator: 0,5,10,15,30 .
An increment is written as <start value>/<increment>. For example, settings a value of 0/10 for the seconds fields means that valid
values are 0, 10, 20, 30, 40, and 50. However, if we change this to 5/10 , valid values become 5, 15, 25, 35, 45, and 55. For the Month field, valid values are 1 (January) through 12 (December). For the Day of Week field, valid values are 1 (Sunday) through 7 (Saturday). Additionally, a value of L may be appended to one of these
values to indicate the last occurrence of this day in the month. For example, 1L can be used to indicate the last Monday of the month. Thanks, Matt
... View more
10-25-2016
03:04 PM
3 Kudos
@mayki wogno In order to list a queue you need the "view the data" policy. in order to empty a queue you need the "modify the data" policy. If you are working with a NiFi cluster, all your nodes in the cluster will also need to be granted these policies as well. Click on the key in the "operate" window to the left of the Canvas: Then select the two policies listed above (Click override if you want to create a new policy and not edit the parent policy that is inherited). Add the Cluster node users and any other users you want to have those abilities. Thanks, Matt
... View more