Member since
07-30-2019
3390
Posts
1617
Kudos Received
999
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 231 | 11-05-2025 11:01 AM | |
| 450 | 10-20-2025 06:29 AM | |
| 590 | 10-10-2025 08:03 AM | |
| 395 | 10-08-2025 10:52 AM | |
| 440 | 10-08-2025 10:36 AM |
05-08-2017
03:08 PM
1 Kudo
@Gaurav Jain Each node in a cluster is responsible for working on its own FlowFiles. Each node is unaware of what FlowFiles other nodes are working on. If a NiFi processor component is working on a FlowFile at the time the Node goes down, the transformation work will start over once that the node is running again. A node disconnecting will not cause processing of FlowFiles to stop on the disconnected node. Processors that do transformation of FlowFile content will produce a new FlowFile once the transformation is complete. So if failure exists mid processing, the original remains on the incoming queue to the processor and the intermediate work is lost. This is how NiFi ensures no data loss occurs in unexpected failures. That being said Data plane High Availability (HA) is one of NiFi's roadmap items. Thanks, Matt
... View more
05-08-2017
02:54 PM
@Gaurav Jain Please explain what you mean when you say "complete workflow not working". Screenshots may help if you can provide them.
... View more
05-08-2017
02:53 PM
@Gaurav Jain
When you find an answer that address your question, please accept that answer to benefit other who come to this forum for help. Thank you,
Matt
... View more
05-08-2017
02:32 PM
@Gaurav Jain The flow.xml.gz file contains everything (Processors, connections, controller services, etc. that make up your dataflow(s) on your canvas. If you try to make a change to a dataflow which has a disconnected node, you will get a response from Nifi that says changes are not allowed while a node is disconnected. You can take manual steps to delete the disconnected node from the cluster via the cluster UI. This will return control, but the node you deleted will not be able to rejoin cluster later (because flows will not match) without doing additional manual steps. Matt
... View more
05-08-2017
02:13 PM
@umair ahmed Just spoke with Dave and he cleaned up his template/response to you here:
https://community.hortonworks.com/questions/101496/nifi-invokehttp-retries.html#answer-101588 His solution for triggering sleep based on retry count set in my template is perfect for meeting your needs. It also scales very easily by simply adding additional timer rules to the advanced UI of the UpdateAttribute processor. Thanks, Matt
... View more
05-08-2017
01:00 PM
@umair ahmed The Retry loop template above allows you to configure the number of retry attempts before existing the loop. I am not sure what you mean by "on time that is it retry at certain time". If the intent is to slow how fast the FlowFile is retired, you could add an additional routeOnAttribute processor to the failure loop to to loop until file has aged x amount of time. Thanks, Matt
... View more
05-08-2017
12:46 PM
1 Kudo
@Gaurav Jain A NiFi cluster consists of the following core capabilities: 1. Cluster Coordinator - One node in a Nifi cluster is elected through zookeeper to be the cluster coordinator. Once an election is complete, all other nodes in the cluster will directly send health and status heartbeats directly to this cluster coordinator. If the currently elected cluster coordinator should stop heartbeating to zookeeper, a new election is held to elect one of the other nodes as the new cluster coordinator. 2. Each Node in NiFi cluster runs independent of each other. They run their own copy of the flow.xml.gz, have their own repo, work on their own FlowFiles. A node that becomes disconnected from the cluster (failed to send heartbeat, network issues between nodes, etc..) will continue to runs its dataflow. If it disconnected due to heartbeat, it will reconnect upon next successful heartbeat. 3. Primary Node - Every Cluster will elect one of its nodes as the primary node. The role of the primary node is run any processor that has been scheduled to run on "primary node only". The intent of the scheduling strategy is to help with processor protocols that are not cluster friendly. For example GetSFTP, ListSFTP, GetFTP, etc... Since every node in a cluster runs the same dataflow, you don't want these competing protocols fighting for the same files on every node. If the node that is currently elected as your primary node becomes disconnected from your cluster, it will stop running any processors configured as "primary node only". The cluster will also elect a new primary node and that new node will start running the "primary node only" configured processors at that time. 4. When a cluster has a disconnected node, any changes to the dataflows will not be allowed. This prevents the flow.xml.gz from becoming unmatched between all cluster nodes. The disconnected node must be rejoined to cluster or dropped completely from the cluster before the editing capability is restored. Thanks, Matt
... View more
05-08-2017
12:22 PM
2 Kudos
@Gaurav Jain The URL provided when adding the Remote Process Group (RPG) to your canvas must be successful only when initially added. Once a successful connection is established the target instance will return a list of currently connected cluster nodes. The source instance with the RPG will record those hosts in peer files. From that point forward the RPG constantly updates the list of available nodes and will not only load-balance to those nodes but will also use anyone of them to get an updated status. Lets assume your source instance of NiFi has trouble getting a status update from any of the nodes, it will still attempt to load-balance with failover delivery of data to the last known set of nodes until communication is successful in getting an updated list. In addition, NiFi will also allow you to specify multiple URLs in the RPG when you create it. Simply provide a comma separated list of URLS for the nodes in the same target cluster. This does not change how the RPG works. It will still constantly retrieve a new listing of available nodes. This allows the target cluster to scale up or down without affecting your Site-To-Site (S2S) functionality. Thanks, Matt
... View more
05-08-2017
12:07 PM
@ismail patel Backpressure thresholds are soft limits and some processors do batch processing. The listHDFS processor will produce a list of Files from HDFS and produce a single 0 byte FlowFile for each file in that list. It will then commit all those FlowFiles to the success relationship at once. So if back pressure threshold was set to 5, the ListHDFS processor would still dump all FlowFiles on to it. (even if the listing consisted of 1000s of Files). At that point backpressure would be applied and prevent the listHDFS form running again until the queue dropped back below 5, but this is not the behavior you need here. The RouteOnAttribute processor is one of those processors that works on 1 FlowFile at a time. This allows us to more strictly adhere to the back pressure setting of 5 on its unmatched relationship. The fact that I used a RouteOnAttribute processor is not important, any processor that works on FlowFiles one at a time would work. I picked RouteOnAttribute because it operates off of FlowFile Attributes which live in heap memory which makes processing here very fast. Thanks, Matt
... View more
05-05-2017
10:00 PM
@Pradhuman Gupta You cannot setup logging for a specific processor. But you can setup a new logger for a specific processor class. First you would create a new appender in the nifi logback.xml file: <appender name="PROCESSOR_FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>${org.apache.nifi.bootstrap.config.log.dir}/nifi-processsor.log</file>
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
<!--
For daily rollover, use 'user_%d.log'.
For hourly rollover, use 'user_%d{yyyy-MM-dd_HH}.log'.
To GZIP rolled files, replace '.log' with '.log.gz'.
To ZIP rolled files, replace '.log' with '.log.zip'.
-->
<fileNamePattern>${org.apache.nifi.bootstrap.config.log.dir}/nifi-processor_%d.log</fileNamePattern>
<!-- keep 5 log files worth of history -->
<maxHistory>5</maxHistory>
</rollingPolicy>
<encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">
<pattern>%date %level [%thread] %logger{120} %msg%n</pattern>
<immediateFlush>true</immediateFlush>
</encoder>
</appender>
Then you create a new logger that will write to that appender log file: <logger name="org.apache.nifi.processors.attributes.UpdateAttribute" level="WARN" additivity="false">
<appender-ref ref="PROCESSOR_FILE"/>
</logger> In the above example i am creating a logger for the UpdateAttribute processor. Now any WARN or ERROR log messages produced by this specific processor will be written to this new log. You can expand upon this flow by configuring loggers for each Processor class you want to monitor and send them to the same appender. Then use a SplitText processor to split the content of the FlowFile produced by the TailFile. then use Route On Content processor to route specific log lines produced by each processor class to a different put email or simply create a different message body attribute for each. Thanks, Matt
... View more