Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Data Copy between Kafka clusters using NiFi gives poor performance

Data Copy between Kafka clusters using NiFi gives poor performance

New Contributor

Hi,

I Have configured the NiFi Workflow to copy data from Kafka cluster 1 to Kafka cluster 2. As a result, it was taking around 11 mins to copy 1 Million messages from source cluster to destination. Can anyone help me improving the performance which can match the mirrormaker time taken for transferring similar amount of data.

Source Cluster : 3 Brokers, The source topic with 3 partitions

Destination Cluster : 3 Brokers, The Destination topic with 3 partitions

NiFi Cluster : 3 Nodes

Attached the workflow, consumerKafka Configs and PublishKafka Configs which i used.

60445-consumer-config.png

60444-workflow.png

60446-publishconfig.png

60447-publishconfig-contd.png

3 REPLIES 3
Highlighted

Re: Data Copy between Kafka clusters using NiFi gives poor performance

Based on the configs you showed, it looks like you would be creating a flow file per kafka message, which will be a lot slower than batching together many messages into a single flow file. Try setting a message demarcator in both consume and publish, this lets consume kafka write many messages to a single flow file separated by the demarcator, and then publish kafka streams the flow file separating it by that demarcator.

Depending on your type of data it may also make more sense to use the "record" versions of the Kafka processors which would take care of the demarcation for you.

Re: Data Copy between Kafka clusters using NiFi gives poor performance

New Contributor

Thanks Bryan. However using demarcator on the publisher will cause ordering and the single message broken into multiple messages issue. I have experienced this earlier.

Re: Data Copy between Kafka clusters using NiFi gives poor performance

That should only happen if the demarcator you are using also exists in the data. For example, if you use a new-line demarcator, then you can't have new lines in each individual message.

Anyway, I would suggest the record based Kafka processors if possible. That would avoid this.

Don't have an account?
Coming from Hortonworks? Activate your account here