Member since
07-30-2019
3382
Posts
1616
Kudos Received
998
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 312 | 10-20-2025 06:29 AM | |
| 452 | 10-10-2025 08:03 AM | |
| 335 | 10-08-2025 10:52 AM | |
| 359 | 10-08-2025 10:36 AM | |
| 392 | 10-03-2025 06:04 AM |
11-01-2016
12:58 PM
6 Kudos
@bala krishnan When the MergeContent processor runs (based on run schedule configuration), it looks at the FlowFiles on the incoming queue and groups them in bins (think of these as logical containers) based on processor properties configurations. You have two Merge Strategies to choose from: 1. Defragmentation: This strategy requires that FlowFiles have the following attributes set on them. ------------------------------ *** Each "bundle" will go in to its own bin. ----------------------------- 2. Bin-Packing Algorithm (default): FlowFiles are simply placed in bins based on the criteria discussed below. Here are the properties that affect how the bins FlowFile are placed in are handled: -------------- *** FlowFile content is never truncated. For example: If a FlowFile where its content is larger then the configured "maximum Group Size" exists on incoming queue will simply become a merged file of one and passed directly to the merged relationship. -------------- *** No maximum simply means that there is no ceiling on the "maximum number of entries" or "maximum group size". Does this mean a bin will never get merged? NO. The currently queued FlowFiles on an incoming connection are placed in bin(s). If at the end of being placed in those bins, any of the bins has reached or exceeded the "minimum number of entries" or "minimum group size", that bin(s) is merged. -------------- *** Don't forget the "Max bin age" acts as your trump card. A bin will be merged if it has been around this long without being merged regardless of the other settings. -------------- Thanks, Matt
... View more
10-31-2016
01:03 PM
ListenHTTP requires 2-way SSL when enabled. So the client will also need a keystore and truststore. The Truststore on both your client and server will need to contain the trusted cert entry for each others client cert. If you used the same CA for both then you should be good. If not you will need to add the CA or trusted key entry (Public key from each private key entry.) to each others Truststores.
... View more
10-31-2016
12:49 PM
@Joshua Adeleke Since you are moving form two very different versions of HDF, I suggest following my procedure above. Stand-up your new HDF 2.0.1 install using a copy of your authorized-users.xml file from your old NiFi and the flow.xml.gz file from your old NiFi. Once up and running, access the UI and fix any invalid processors, controller services, and reporting tasks. There should not be many. ***NOTE: you can not copy the entire conf dir from that older version to the new as there are many changes to the files in that directory. (some do not exist in the new version and the new version has added some additional config files. What you can do is use the contents/configurations in many of the old files (named same) to configure like properties in the new NiFi's config files. Matt
... View more
10-27-2016
02:19 PM
The SplitText will always route the incoming FlowFiles to the original relationship. What the SplitText processor is really doing is producing a bunch of new FlowFiles form a single "original FlowFile". Once all the splits have been created, the original un-split FlowFile is routed to the "Original" relationship. Most often that relationship is auto-terminated because users have need use for the original FlowFile after the splits are created.
... View more
10-27-2016
12:45 PM
2 Kudos
@Joshua Adeleke First lets make sure we are on the same page terminology wise.... NiFi FlowFiles --> FlowFiles are made up of two parts, FlowFile Content and FlowFile Attributes. FlowFiles Content is written to NiFI's content Repository while FlowFile Attributes mostly live in JVM heap memory and the NiFi FlowFile repository. It is the FlowFile attributes that move from processor to processor in your dataflow. Apache NiFi does not have a version 1.0.0.2.0.0.0-579. That is an HDF version of Apache NiFi 1.0.0. If you are migrating to HDF 2.0, I suggest instead migrating to HDF 2.0.1. It has many bug fixes you will want... https://docs.hortonworks.com/HDPDocuments/HDF2/HDF-2.0.1/index.html When you say you want to move the "dataflow process: to another server, are you talking about your entire canvas and all configured processors only? Or are you also talking about moving any active FlowFiles within the old flow to the new? My suggestion would be to stand-up your new NiFi instance/cluster. You can copy your flow.xml.gz file from your old NiFi to the new NiFi. ***NOTE: you need to make sure that the same sensitive props key is used (config setting for this is found in nifi.properties file) or your new NiFi will not start because it will not be able to decrypt any of your sensitive properties in that file. If your old NiFi was secured, you can use its authorized-users.xml file to establish the initial admin authorities in the new NiFi (configure this in authorizers.xml file). Once started you will need to access this new NiFi's UI and address any "invalid" processor/controller services, and add any controller services you may have had running on the NCM only in the old version. (No NCM in HDF 2.x versions). Some of these may be invalid because of changes to the processor/controller service properties. Once all has been addressed you are ready to move to the next step. Shutdown all ingest processor on your old NiFi and allow it to finish processing out any data it is already working on. At the same time you can start your new NiFi NiFi so it starts ingesting any new data and begins processing it. Thanks, Matt
... View more
10-26-2016
07:33 PM
@Saikrishna Tarapareddy You can actually cut the ExtractText processor out of this flow. I forgot the RouteText processor generates a "RouteText.Group" FlowFile attribute. You can just use that attribute as the "Correlation Attribute Name" in the MergeContent processor.
... View more
10-26-2016
07:10 PM
1 Kudo
@Saikrishna Tarapareddy I agree that you may still need to split your very large incoming FlowFile into smaller FlowFiles to better manage heap memory usage, but you should be able to use the RouteText and ExtractText as follows to accomplish what you want:
RouteText configured as follows: All Grouped lines will be routed to relationship "TagName" as a new FlowFile. They feed into an ExtractText configured as follows: This will extract the TagName as an attribute of on the FlowFile which you can then use as the correlationAttribute name in the MergeContent processor that follows. Thanks, Matt
... View more
10-26-2016
05:46 PM
@Zack Riesland There are seven fields; however, the seventh field is optional. So you are correct. so both " 0 0 18 * * ? " and " 0 0 18 * * ? * " are valid. The below is from http://www.quartz-scheduler.org/documentation/quartz-2.x/tutorials/crontrigger.html ---------------- * (“all values”) - used to select all values within a field. For example, “” in the minute field means *“every minute”. ? (“no specific value”) - useful when you need to specify something in one of the two fields in which the character is allowed, but not the other. For example, if I want my trigger to fire on a particular day of the month (say, the 10th), but don’t care what day of the week that happens to be, I would put “10” in the day-of-month field, and “?” in the day-of-week field. See the examples below for clarification. ----------------- so only fields 4 and 6 will accept ?. Thanks, Matt
... View more
10-26-2016
05:31 PM
Same hold true for the MergeContent side of this flow. Have a MergeContent merger the first 10,000 FlowFiles and a second merger multiple 10,000 line FlowFiles into even larger merged FlowFiles. This again will help prevent running in to OOM errors.
... View more