Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Fix slow NiFi processing, specifically MQTT

avatar
New Contributor

I have been working with NiFi to do some basic testing with MQTT ingestion and publishing using protobufs. Everything seems to work fine, but each step takes about 30-60s to process. This seems incredibly long for such a simple process. I tried to see if there were any configuration settings that might need adjusted, but didn't have any success there. I also, setup a pipeline that consumes then publishes what it consumes and that still takes 30+ seconds. 

Is this a known issue or a setting that I have missed?

I have used Streamsets in the past, so I decided to try it there and the same setup executes sub-second. 

I am using NiFi version 1.19.1

Appreciate any thoughts or assistance!

2 ACCEPTED SOLUTIONS

avatar
Master Mentor

@tcain 
Also noticed that Apache NiFi recently changed the default run schedule on some processors from "0 sec" (run as often as possible) to "1 min".  ConsumeMQTT is one of the processor that had that default changed.  So when ConsumeMQTT is started (put in to a running state), it will schedule execution immediately and then will not get scheduled to execute again for 1 min.  So it is possible that you have a delay in consumption simply because the processor is not being scheduled often enough.  Try changing the run schedule on the consumeMQTT to "0 sec" and re-run your test.  (should be "0 sec" on all 3 of your processors.).  

Will also be making recommendation that this particular processor be reverted back to a default of 0 sec in Apache NiFi.

If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.

Thank you,

Matt

View solution in original post

avatar
New Contributor

@MattWho That was it! Thank you! 

View solution in original post

5 REPLIES 5

avatar
Master Mentor

@tcain 
Here are some general things you should look at when encountering performance related issues:

1. How large are your dataflow(s) on yoru NiFi canvas?

2. How many running processors?

3. How large is yoru Max Timer Driven Thread count resource pool? (This is pool of threads sed by all processors to execute code, default is 10.  This should be incremented in small amounts as you monitor your CPU load average  to make sure you system is not CPU saturated.)
4. How is the health of your NiFi's JVM (Garbage collection happening very often, how long are the GC pauses)?
5. How is your disk I/O for the disk(s) hosting your NiFi content_repository, flowfile_repository, and provenance_repositories?
6. How many concurrent tasks are set on your processor furthest downstream with a backlog on its inbound connection (concurrent task should be carefully increased in strategic components.  Setting too high can have adverse affect on performance depriving other processors from being able to execute  optimally)?

If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.

Thank you,

Matt

avatar
New Contributor

Thanks for the reply! 

1. My data flow is really small. I have 2 pipelines, that each consist of a consumeMQTT, encodeProtobuf and Publish. I am sending a single JSON message for it to go through the pipeline.
2. Is there a place to set that? I am running it as a Docker container on a 2019 6core I7. I have not limited the processors purposefully. I didn't see anything that would be, but that maybe something I am missing. Specifically using this setup: https://hub.docker.com/r/apache/nifi
3. I have tried increasing it slowly, I got it up to 100 without seeing any large change. I assume this is due to the single event being processed. What is a large value for this number? 
4. I am not sure how to check this.
5. This does not seem outside of normal ranges if i am looking at the correct task.
6. The each pipeline is linear with 3 processors and only taking one input per test.

This setup is just to verify the tool will work for our needs, so it is very simplistic. Which is why the processing time is confusing me 🙂 

Thank you for your help!

avatar
Master Mentor

@tcain 

I see you are using a processor that is not part of the default Apache NIFi distribution (encodeProtobuf), so can't really comment on the performance of configurations specific to that processor.

Can you share your configuration used in your 3 processors to include the Settings, Scheduling, Properties, and Relationship tabs?  That will help in understanding your current dataflow implementation setup.  I am very interested initially in the scheduling tab for each processor you are using.

 

2. The status bar jus above the the canvas in the NiFi UI will give a summary of all the component counts used on your canvas and their current status (enabled, disabled, running, stopped, invalid, etc).  Based on your description is expect on a "3" next to:

MattWho_0-1675887479969.png

3. The Max Timer Driven Thread Count setting can be found by going to Global menu ---> Controller Settings. Although it does not sound like this may be issues since you have only 3 processors on your canvas and nothing else, correct? simply changing this value does not translate in a processor being able to do concurrent execution.
4. You can see your current JVM details (for standalone NiFi) via global menu --> summary --> system diagnostics (lower right) or (for Clustered NiFi), global menu --> cluster --> JVM tab.
5. ...
6. How large is the JSON you are sending through your pipeline?

7. What do you observe specific to lineage of your processed FlowFile.  You could run a provenance query.  Immediately after processing a FlowFile through your dataflow, you can run a data provenance query.  global menu ---> data provenance.   Screenshot of that return may be helpful as well to show execution times of each processor.  You can also click the small "view details" icon to the far left of each event for that FlowFile. From there you can see things like event duration and lineage duration.  This can help narrow down where specifically the slow down is occurring.

I look forward to your detailed feedback and the additional information you can share here.

If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.

Thank you,

Matt

avatar
Master Mentor

@tcain 
Also noticed that Apache NiFi recently changed the default run schedule on some processors from "0 sec" (run as often as possible) to "1 min".  ConsumeMQTT is one of the processor that had that default changed.  So when ConsumeMQTT is started (put in to a running state), it will schedule execution immediately and then will not get scheduled to execute again for 1 min.  So it is possible that you have a delay in consumption simply because the processor is not being scheduled often enough.  Try changing the run schedule on the consumeMQTT to "0 sec" and re-run your test.  (should be "0 sec" on all 3 of your processors.).  

Will also be making recommendation that this particular processor be reverted back to a default of 0 sec in Apache NiFi.

If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.

Thank you,

Matt

avatar
New Contributor

@MattWho That was it! Thank you!