Member since
07-30-2019
3406
Posts
1621
Kudos Received
1006
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 84 | 12-17-2025 05:55 AM | |
| 145 | 12-15-2025 01:29 PM | |
| 99 | 12-15-2025 06:50 AM | |
| 224 | 12-05-2025 08:25 AM | |
| 380 | 12-03-2025 10:21 AM |
04-25-2018
05:35 PM
@Xavier
COUDRE
There are multiple policies needed for S2S to work. The first policy is a "Global" policy (Found under "Policies" via the hamburger menu in upper right corner of NiFi UI): You must authorize the source NiFi instance(s) (This would be any NiFi instance running a RPG trying to connect to this NiFi). This allows the RPG to communicate with the target to retrieve details such as number of connected nodes, load on those nodes, list of authorized input and output ports and supported connection information (node hostnames, is RAW transfer enabled, what is RAW port...) - In order for any ports to be returned, the source NiFi instance(s) must be authorized in those ports. I believe this is the step you have already completed which we were discussing initially in this answer. - Thanks, Matt - HCC Forum Tip: Try to avoid responding to an existing answer with a new answer. Comment on answer instead. It makes discussion easier to follow, especially when multiple answer are being discussed.
... View more
04-24-2018
08:57 PM
1 Kudo
@Jose
Gonzalez You can specify more then one host but it is not required. Once the RPG establish a connection to the target host it will retrieve the S2S details of the target cluster and store that locally. If the host you provided become unavailable at anytime after that initial connection , it will try anyone of the other nodes it learned about previously to get S2S details. Having multiple nodes configured helps when NiFi by giving the source Nifi more then one target node to establish initial connection with. - Your load-balancing issue is completely unrelated to how many nodes URLS you configured in your RPG. Here is an article that covers how load-balancing works with an RPG: https://community.hortonworks.com/content/kbentry/109629/how-to-achieve-better-load-balancing-using-nifis-s.html - Thanks, Matt
... View more
04-24-2018
12:55 PM
@Olivier
Drouin and @Xavier
COUDRE
- At this time Remote Process Groups are only able to see "remote" input and output port. NiFi only treats input and output ports added to the root canvas of the target NiFi as remote Process groups. So as you can see from the above diagram, your MiNiFi RPG will only be able to see the input port at the root level. You will then need to use an input port in each sub-process group to move the received FlowFiles from the root canvas level down in to Process group level 3. I completely understand the complexity this adds in a multi-nested process group dataflow design. - There is a Jira out there requesting an enhancement to NiFi to separate input/output ports into local and remote: components. https://issues.apache.org/jira/browse/NIFI-2933 - However, my understanding is that there are significant design hurdles for such a change. There have been other design discussion around being able to connect directly from a top level process group to a deep nested process group without needing to create ports at each level. - Hope this helps to clarify things, Matt
... View more
04-23-2018
08:13 PM
1 Kudo
@Joshua Adeleke - Just want to add some details here for others who may be seeing same behavior. - There are two process that are part of a running NiFi. When you start NiFi, you are really starting the NiFi bootstrap process. (This is process that Ambari in HDF monitors). The bootstrap process is responsible for kicking off a second java process that runs the main application. This second process may take several minutes to completely start on every node in a NIFi cluster. - As each node starts it will reach a point where the cluster will be formed. Each Node will communicate with zookeeper looking to see if the cluster already has an elected flow, Cluster coordinator and primary node. If none exist an election begins. The election is held for 5 minutes (default) or until all nodes have connected (based on configured number of election candidates. HDF sets this for you in the nifi.properties file but otherwise it is blank.). - Once a flow, cluster coordinator, and primary node have been chosen, the election is over. The UI will now become available. At the same time, nodes will send heartbeats to the elected cluster coordinator to join the cluster. Once all nodes have joined the UI will reflect all nodes as connected. - ***NOTE: Very large queue backlogs in your flow can extend the length of time it takes for a NiFi node to come up and join in to the cluster. Since NiFi must load parse all that FlowFile information form the FlowFile repository and load it back in to the designated queues in your dataflows. - Thanks, Matt
... View more
04-23-2018
07:51 PM
2 Kudos
@dhieru singh 1. It is hard to say exactly how long an upgrade will take because there are more components then just NiFi included in the HDF stack. However, the typical upgrade goes fairly quickly. It is not necessary to stop your flows, your flows will be stopped when Nifi is stopped for the upgrade. I would recommend stopping dataflows when upgrading between major release versions. (for example HDF 2.x to HDF 3.x). The new version of NIFi will point at same repositories your old NiFi used, so dataflows will startup and continue where they left off. - 2. There is never a need to upgrade processors manually post an upgrade. *** The only time that would occur is if post upgrade NiFi found more then one version of a processor (however this would only occur if a user had multiple versions of a processor in a custom lib directory). Otherwise, NiFi will use the processor version available in the upgraded NiFi version automatically. - While I can think of none, you will want to check your dataflows post upgrade to make sure you don't have any unexpected "invalid" processors. It is rare but possible that processors types you are using may have been updated and had added new required properties. These properties would yield the processor invalid until those properties were populated. This happens very rarely and as i mentioned can think of no such cases going from HDF 3.0.x to HDF 3.1.x. - Hope this addresses your upgrade concerns, Matt - If you found this post has addressed your question, please take a moment to login and clcik "accept" on the answer.
... View more
04-23-2018
07:35 PM
@mayki wogno It is very possible the slow responses are the result of a very high number of http requests coming in to your NiFi nodes. - The main contributor to high number of requests are Remote Process Groups (RPG). It is very common for users to design dataflows that use many RPGs throughout their canvas to redistribute FlowFiles across their cluster. Each RPG is pinging the target http Nifi instance for the current Site-To-Site (S2S) details. Assume you have a 5 node cluster with 20 RPGs all pointing back to same cluster as an example. That means that every RPG on every node is requesting S2S details every 30 seconds. That alone is 100 HTTP request every 30 seconds. There is a improvement implemented through https://issues.apache.org/jira/browse/NIFI-4598 (fixed in NiFi 1.5) to improve how RPGs work in this scenario. - Additionally NiFi 1.2 is hardcoded to allow only 100 concurrent http requests which can lead to temporary unavailable of the http endpoint. This was resolved in an improvement covered via https://issues.apache.org/jira/browse/NIFI-4143 (fixed in NiFi 1.4) which allows users to increase the number of allowed concurrent http requests. - Another suggestion to improve http endpoint performance would be to make sure your RPGs are configured to use the "RAW" transport instead of "HTTP". While S2S details are still retrieved over http, the transfer of FlowFiles would be sent over a dedicated socket port instead of over http port. - Upgrading to Apache NiFi 1.5 will include both of the fixes above. HDF 3.1 is based off Apache NIFi 1.5. - Thank you, Matt - If you found this answer addressed your question, please take a moment to login and click "accept".
... View more
04-23-2018
02:47 PM
2 Kudos
@Davide Isoardi - The UpdateAttribute processor does not read the content of a FlowFile. In order for the above Expression Language statements to work, the incoming FlowFile's must have FlowFile attributes "user" and "datetime" created on them. - Stop the UpdateAttribute processor and allow a few FlowFiles to queue. Then list that queue and verify what attributes currently exist on those listed FlowFiles. - Thanks, Matt
... View more
04-23-2018
01:39 PM
@laiju cbabu The most important things to understand about NiFi's cluster architecture is the every node in the cluster runs with its own local copy of the flow.xml.gz (this file contains every configuration any user has made the the NiFi Ui (building flows on canvas, adding reporting tasks, adding controller services, etc...). - Because of NiFi's HA control layer, user can login to any node in the an active cluster and make changes within the canvas. The NiFi control layer takes care of making sure those changes are replicated to every node connected to that cluster. - Each node also node runs with its own set of repositories (FlowFile, content, provenance and database). Since NiFi does not currently have a HA data layer, should a NiFi node go down the data currently being processed by that node will not be processed until that node is restarted. It is important that the flowfile and content repositories (essential for data integrity) are protected through using RAID disk setups. It is actually easy to standup an entirely new node that uses these same repos and pickup where the old dead node left off. There is no way to merge the contents of two node's repositories together however. - Thank you, Matt
... View more
04-20-2018
01:03 PM
@Olivier
Drouin
Where is this input port "From Minifi" located on your canvas? At top level canvas or with a process group? Input and output ports facilitate the movement of FlowFiles between a parent process group and a child process group (where the input and output ports exist). So Input/output ports created within a process group only allow for receiving or sending of FlowFiles from the parent process group. The top level of your canvas is actually just another process group (root process group). When you create an input or output port here it becomes a remote input/output port capable of transferring Flowfiles over S2S. You will even see that these root level input/output ports are rendered a little different and have more configuration options. Thanks, Matt
... View more
04-18-2018
06:31 PM
2 Kudos
@Rahoul
A
Here is a typical logback appender entry for the nifi-app.log: <appender name="APP_FILE">
<file>${org.apache.nifi.bootstrap.config.log.dir}/nifi-app.log</file>
<rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
<!--
For daily rollover, use 'app_%d.log'.
For hourly rollover, use 'app_%d{yyyy-MM-dd_HH}.log'.
To GZIP rolled files, replace '.log' with '.log.gz'.
To ZIP rolled files, replace '.log' with '.log.zip'.
-->
<fileNamePattern>${org.apache.nifi.bootstrap.config.log.dir}/nifi-app_%d{yyyy-MM-dd_HH}.%i.log</fileNamePattern>
<maxFileSize>100MB</maxFileSize>
<!-- keep 30 log files worth of history -->
<maxHistory>30</maxHistory>
<!-- optional setting for keeping 10GB total of log files
<totalSizeCap>10GB</totalSizeCap>
-->
</rollingPolicy>
<immediateFlush>true</immediateFlush>
<encoder>
<pattern>%date %level [%thread] %logger{40} %msg%n</pattern>
</encoder>
</appender> - Their are couple of lines you want to pay attention to here: <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy"> The above line defines what type of policy is being used by this appender. Here is will roll based in size and time. <fileNamePattern>${org.apache.nifi.bootstrap.config.log.dir}/nifi-app_%d{yyyy-MM-dd_HH}.%i.log</fileNamePattern> In the above line you can see log is designed to roll on the hour based on yyyy-MM-dd_HH. <maxFileSize>100MB</maxFileSize> The above lines states no single log file should be larger then 100 MB, so you must account for scenarios where logs for a single hour may exceed 100 MB. That is what the "%i" is for in the fileNamePattern line above. <maxHistory>30</maxHistory> The above lines dictates how many "hours" of logs to retain. Each hour may contain 1 to many log files because of the max size constraint. - - - So in your case, using above as an example you would set maxHistory to "240". - - - You will also see an additional option property (commented out in above example): <!-- optional setting for keeping 10GB total of log files
<totalSizeCap>10GB</totalSizeCap>
--> This property is important if you have limited logging disk space, It will start rolling off the oldest logs if you exceed this configured max threshold. - Thank you, Matt
... View more