Member since
07-30-2019
3404
Posts
1621
Kudos Received
1003
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 173 | 12-05-2025 08:25 AM | |
| 311 | 12-03-2025 10:21 AM | |
| 585 | 11-05-2025 11:01 AM | |
| 437 | 11-05-2025 08:01 AM | |
| 887 | 11-04-2025 10:16 AM |
01-18-2018
03:08 PM
@Alvin Jin With NiFi 1.5, NiFi has become more restrictive with requard to the allowed headers coming from a client. The hostname in the request header is checked against the configured hostname in the nifi.properties file: nifi.web.http(s).host= If they do not match, you will encounter the error you are seeing. So you will need to access the NiFi UI in 1.5 using the same hostname as specified in that property. Thanks, Matt
... View more
01-18-2018
12:31 PM
@Andrew thomas Once you have a NiFi cluster installed and running, any change you make within the NiFi Ui affects every node in the cluster. There is no way in a NiFi cluster to deploy a flow to just one node only. There is no need to stop any existing running flow within the UI when you are adding/building new flows within the same UI. Any configuration changes that affect any one of NiFi's configuration files (except logback.xml) will require a NiFi restart before those changes will take affect. Thank you, Matt
... View more
01-17-2018
03:39 PM
@Andrew thomas
When a NiFi node starts, it unpacks all its nar to the work directory. Before joining an existing cluster, it checks that the following three files locally match exactly what is being used in the cluster: flow.xml.gz <-- contains all configurations done via NiFi UI
users.xml <-- Will only exist if NiFi is secured and the default NiFi file based authorizer is being used.
authorizations.xml <-- Will only exist if NiFi is secured and the default NiFi file based authorizer is being used. At no time does NiFi compare to make sure that all nodes are running the same set of nars. Mismatched nars only becomes an issue when that component is being used in the flow.xml.gz. In your scenario where you are try to deploy a new custom nar which is not yet being used on your canvas you can update each node one at a time. I recommend creating a new lib directory on each of your NiFi nodes that will hold all your custom nars separately from the default NiFi lib directory. Then modify the nifi.properties file on each of your nodes (this can be done while NiFi is still running since this file is only read on startup). Add the following line to your nifi.properties files: nifi.nar.library.directory.custom=/<path to my custom nifi nars directory> Make sure this directory and your custom nars placed in this directory have proper ownership and permissions. In order to minimize impact on your operational cluster, restart one node at a time. Once a restarted node has joined the cluster, move on to the next. Once all nodes have been restarted and cluster has all nodes reconnected, you can start using your custom processor components on the canvas. Thank you, Matt If you found the answer addressed your question, please take a moment to click on "accept" found below the answer.
... View more
01-16-2018
09:54 PM
@Chris Lundeberg Are you entering "${schema.fingerprint}" or "schema.fingerprint" in the "Correlation Attribute Name" property field of the MergeContent processor? If you are looking to bin files where the value assigned to the "schema.fingerprint" attribute matches, you will want to enter only "schema.fingerprint" in that property. If you want your correlation attribute name to be more unique, you can use the updateAttribute processor before the MergeContent to create something more unique based on both ${schema.fingerprint} and ${tablename}. for example: followed by MergeContent configured similar to below: You understanding of my original explanation was correct. That could also explain missing attributes since all Flwofile without the attribute would end up in same bin and only matching attributes would be retained on merged FlowFiles. As far as one or many duplicate flows. It makes sense to have only a single flow here, but having many identical flows is also ok. Only thing to consider is NiFi processors obtain threads from a thread pull configured in NiFi. The more processors you have, the more processors there are requesting to use a thread from that thread pool. Plus more dataflows just means more processors to manage. The "Max Timer Driven thread count" is set within "Controller settings" found within the hamburger menu icon in the upper right corner of the NiFi UI. You will also find a setting for "Max Event Driven Thread count there",but do not change the value there. There is nothing you will add to the canvas that will use Event Driven Threads unless you specifically configure the processor to use them. This is a deprecated feature that only still exist to avoid breaking backwards compatibility with older flows. Timer Driven thread are much more efficient and will out perform Event Driven threads anyway. The default for Timer Driven threads is only 10. a good staring place here is 2 - 4 times the number of cores you have on a single NiFi host. Assume a 4 node NiFi cluster 4 hosts X (32 cores per node). You would set the Max Timer driven Thread count setting to 64 - 128. Assuming you set it to 32, this would mean there are 32 threads available per host for a cluster total of 128. Monitor top on your systems as you run your dataflow and adjusts as you see fit from there. Thanks, Matt Please take a moment to click "Accept" if you feel I have addressed your questions.
... View more
01-16-2018
06:10 PM
1 Kudo
@Chris Lundeberg Maybe helpful to share your MergeContent processors configuration here. 1. How many bins is the processor configured to use? 2. Sounds like each incoming FlowFile may have a considerable Attribute map size. All the attributes of the FlowFiles being merged are held in heap memory until the merge is complete, You may be having heap issues. Seen any Out of Memory errors in the nifi app .log? 3. What is the correlation attribute you are using to bin like FlowFiles? 4. How large is each FlowFile being merged? If they are very small (meaning it would take more then 20,000 of them to reach a 64 MB merged file), you may want to use multiple mergeContent processors in series to reduce the heap usage. Useful links: https://community.hortonworks.com/questions/149047/nifi-how-to-handle-with-mergecontent-processor.html https://community.hortonworks.com/questions/87178/merge-fileflow-files-based-on-time-rather-than-siz.html I have no personally seen FlowFiles routed to Failure losing their attributes. That seems very odd to me. The merged FlowFile, depending on configuration, may have different attributes however. I am assuming that your "avro_schema" attribute nay be fairly large. It may be better to use something smaller for your correlation attribute value in the MergeContent processor. You could use the ExtractAvroMetadata processor before the MergeContent processor. It will give you a "schema.fingerprint" attribute you could use instead to accomplish the same. Are you putting "${avro_schema}_${tablename}" in the mergeContent processor's Correlation Attribute Name property value? What this property does is resolve the provide EL above to its actual values then checks the incoming Flowfiles for an attribute with that resolved value. If found it places FlowFiles where the value of that resolved attribute match in the same bin. Just want to make sure you are using this property correctly. All FlowFiles that do not have the FlowFile attribute are allocated to a single bin. You also need to make sure your mergeContent processor is configured to have enough bins (number of needed bins +1) to accommodate all the various possible unique correlation attribute values. If you do not have enough bins, the mergeContent will force the merging of the oldest bin to free a bin to continue allocating additional FlowFiles. Thank you, Matt
... View more
01-16-2018
05:07 PM
@Eric Lloyd With the above configuration, it would only take 1 FlowFile to be assigned to a bin before that bin was marked eligible for merging. There is nothing there that force the processor to wait for other FlowFiles to be allocated to a bin before merge, Both minimums are set to 1 FlowFile and 0 Bytes. In order to actually get 100,000 Flowfiles (this is high and may trigger OOM), there would need to be 100,000 Flowfiles all with the same correlation attribute value in the incoming connection queue at the time the processor runs. This is almost certainly not going to be the case. The Max bin age simply sets an exist strategy here. It will merge a bin regardless if minimums have been met if the bin age has reached this value. You may want to set more reasonable values for your mins and also consider using multiple mergeContent processors in series to step up to the final merged number you are looking for. Thanks, Matt
... View more
01-16-2018
01:38 PM
1 Kudo
@Roger Young The Remote Process Group (RPG) is not designed for dynamic target URL assignment. It is designed to communicate with a target standalone of NiFi cluster. During that communication it learns about all currently connected nodes in a target Nifi cluster and retains the URLS for all those nodes so it can perform a load-balanced delivery of data. It the event RPG cannot get an updated listing form the target it will continue to try to delivery to the last known set of target nodes. Since the RPG was never intended to be used to delivery data to multiple independent target NiFi instances, the ideal of dynamic URL was never considered. There are other NifI processors such as putHTTP and InvokeHTTP that can take NiFi Expression Language (EL) as input for the target URL. Thank you, Matt
... View more
01-15-2018
09:34 PM
1 Kudo
@dhieru singh NiFi offers a "Summary" UI which has a connections tab you can select. Once selected you can sort on the "Queue / Size threshold" column by clicking on the "Queue" word". This will move your connections with the largest threshold percentage to the top of the list. 100% for Queue threshold indicates object threshold back pressure is being applied on that connection. 100% for Size threshold indicates that Queue Size back pressure is being applied on that connection. Clicking on the error to the far right of the row will take you directly to hat connection on your canvas no matter which process group it exists within. Thank you, Matt If you found this answer to be helpful in addressing your question, please take a moment to click the accept link below.
... View more
11-22-2017
04:10 PM
@Pratik Ghatak The NiFi instance/cluster with the Remote Process Group (RPG) is acting as the client and the target Nifi instance/Cluster is acting as the server in this Site-To-Site (S2S) connection. Your Target NiFi configuration you shared shows taht you have S2S setup to support both the RAW and HTTP transport protocols. It also appears from your screenshots that the initial connection between your two NiFis is working correctly. The error you are seeing indicates that you have not added any remote input ports on the target NiFi's canvas to receive a FlowFIles from the source NiFi. On the target NiFi, you will need to add one or more "input ports": Input ports added at the root/top level process group are considered "Remote input ports" and can be used to receive data over S2S. After you add these "remote input ports" to your target Nifi's canvas, you can right click on your source NiFi RPG and select "refresh" from the context menu that appears (or you can just wait for next auto-refresh at 30 seconds). Now you should be able to see those remote input ports when you drag a connection to the RPG. Thank you, Matt If you find this information has addressed your question/issue, please take a moment to click "Accept" beneath the Answer.
... View more
11-22-2017
03:58 PM
1 Kudo
@Mohamed Hossam You could use the ReplaceText processor instead of your script to accomplish what you are trying to do: The above ReplaceText processor will create 4 capture groups for the desired columns from your input FlowFiles. It will even work against incoming FlowFiles that have multiple entries (1 per line) Thank you, Matt If you find this answer addresses yoru question/issue, please take a moment to click "Accept" beneath the answer.
... View more