Support Questions

srilakshmi · ‎03-22-2023

Hi,

I have query on "Delivery guarantee" parameter in publishKafka processor in NIFI.

I have query on below two option supported for the "Delivery guarantee"

1. Best effort

2. Guarantee single node delivery

I know that the "best effort" does not wait for the ack/response from kafka, but in case of publish failure, when either of the above two option is set, will NIFI try to publish among the available nodes? Like in my setup its the cluster of three kafka nodes. If one of the node is down or if we observe connection timeout issue with one of the node, will NIFI tries to publish with other available nodes?

Thanks

ckumar · ‎03-28-2023

PublishKafka writes messages only to those Kafka nodes that are leaders for a given topic: partition.

Now it's Kafka internal job to keep the In-Sync Replicas in sync with its leader.

So with respect to your question:

When the Publisher client is set to run ,client sends a (read/write) request the bootstrap server, listed in the configuration bootstrap.servers to get the metadata info about topic: partition details, that's how the client knows who are all leaders in given topic partitions and the Publisher client writes into leaders of topic: partition

With "Guarantee single node" and if kafka broker node goes down which was happen to be a leader for topic: partition then Kafka will assign a new leader from ISR list for topic: partition and through Kafka client setting metadata.max.age.ms producer refreshed its metadata information will get to know who is next leader to produce.

If you found this response assisted with your issue, please take a moment and click on "Accept as Solution" below this post.

Thank you

View solution in original post

MattWho · ‎03-27-2023

@srilakshmi

The PublishKafka processor can be configured with a comma separated list of KaFla brokers. If the processor at time of execution is able to communicate with one of these configured brokers, it will received a destination for publishing the content. If during the publish a failure occurs, The FlowFile is routed to the failure relationship. You have configurable options to retry on failure x number of times. You should avoid auto-terminating failure relationships in your datafow designs unless data loss is acceptable. Each attempt is a new execution of the processor which means connect to broker again. A failure is when PublishKafka was unable to send all the content bytes (for example: connection gets closed).

Best Effort and Guarantee single node delivery setting in the PublishKafka processor have nothing to do with the NiFi nodes in the NiFi cluster. This has to do with the nodes in the destination Kafka setup.

In a NiFi cluster each node executes its own copy of the dataflow(s) and each node has its own content and FlowFile repositories. Nodes are unaware of FlowFiles that exist on the other nodes in the cluster. So a FlowFile's content that fails to publish on say node2 will route to failure relationship on node 2 and if you use retry, will be executed on again on node 2. When a node goes down the FlowFile queued in connection remain on that node until it is brought back on line. When the node comes back up, it will continue processing FlowFiles from the last connection in which they were queued. So it is important the the Content and FlowFile repositories are protected to avoid dataloss (such as using RAID storage). A node that is disconnected from the cluster will still execute its dataflow(s) as long as NiFi is still running on that node.

If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.

Thank you,

Matt

srilakshmi · ‎03-27-2023

Hi @MattWho

Thanks for the reply.

I know that "Best Effort" and "Guarantee single node" delivery setting in the PublishKafka processor have nothing to do with the NiFi nodes in the NiFi cluster. This has to do with the nodes in the destination Kafka setup.

My query is when i set the options as "Guarantee single node" and there are 3 kafka nodes.

So when one of the kafka node is down, the publishkafka processor will try with other two kafka nodes, is my understanding right?