I Have configured the NiFi Workflow to copy data from Kafka cluster 1 to Kafka cluster 2. As a result, it was taking around 11 mins to copy 1 Million messages from source cluster to destination. Can anyone help me improving the performance which can match the mirrormaker time taken for transferring similar amount of data.
Source Cluster : 3 Brokers, The source topic with 3 partitions
Destination Cluster : 3 Brokers, The Destination topic with 3 partitions
NiFi Cluster : 3 Nodes
Attached the workflow, consumerKafka Configs and PublishKafka Configs which i used.
Based on the configs you showed, it looks like you would be creating a flow file per kafka message, which will be a lot slower than batching together many messages into a single flow file. Try setting a message demarcator in both consume and publish, this lets consume kafka write many messages to a single flow file separated by the demarcator, and then publish kafka streams the flow file separating it by that demarcator.
Depending on your type of data it may also make more sense to use the "record" versions of the Kafka processors which would take care of the demarcation for you.
Thanks Bryan. However using demarcator on the publisher will cause ordering and the single message broken into multiple messages issue. I have experienced this earlier.
That should only happen if the demarcator you are using also exists in the data. For example, if you use a new-line demarcator, then you can't have new lines in each individual message.
Anyway, I would suggest the record based Kafka processors if possible. That would avoid this.