Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Rate of Publishing - kafka processor

Solved Go to solution
Highlighted

Rate of Publishing - kafka processor

Explorer

I would like to know if the Run schedule stands for "rate at which the processor is publishing or writing into another processor like "Put File"

I am publishing kafka into a topic from where kafka streams is called and then so on. For performance testing, I would like to fix the rate at which the log is written into topic. Can anybody suggest me how?
For eg. 100 records/log lines per second.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Rate of Publishing - kafka processor

The Run Schedule is the schedule of when the NiFi framework will execute a processor. The default of timer driver 0 seconds means to execute as fast as possible when there is data available in the incoming queue, if no data is there then it doesn't execute.

The rate of the data depends on what the processor does during one execution... for example, lets say a queue has 100 flow files in it and you set the processor to run every 5 minutes. Some processors may grab a batch of files during one execution, so even tough the processor executes once, it may grab 50 of those flow files. It also depends if your flows files have multiple logical messages in the content. If you have 1 record per flow file, and if the processor only grabs 1 flow file at a time (most only take one at a time), then the run schedule does control the rate.

You can look at ControlRate processor as well.

View solution in original post

1 REPLY 1
Highlighted

Re: Rate of Publishing - kafka processor

The Run Schedule is the schedule of when the NiFi framework will execute a processor. The default of timer driver 0 seconds means to execute as fast as possible when there is data available in the incoming queue, if no data is there then it doesn't execute.

The rate of the data depends on what the processor does during one execution... for example, lets say a queue has 100 flow files in it and you set the processor to run every 5 minutes. Some processors may grab a batch of files during one execution, so even tough the processor executes once, it may grab 50 of those flow files. It also depends if your flows files have multiple logical messages in the content. If you have 1 record per flow file, and if the processor only grabs 1 flow file at a time (most only take one at a time), then the run schedule does control the rate.

You can look at ControlRate processor as well.

View solution in original post

Don't have an account?
Coming from Hortonworks? Activate your account here