Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar
Expert Contributor

Apache NiFi provides a great depth of information about each component on the canvas. Stats are collected for the number of FlowFiles in, the number of FlowFiles out, how many bytes were read or written, how long it took, etc. However, these stats that are displayed here are all "stock metrics." These metrics are the same for each processor. There are times when we may want to expose other metrics for a custom processor.

One way that we can do this in NiFi is through the use of counters. The counters are a lesser-known feature of NiFi, and until recently have been pretty limited in their usefulness. This is due to the fact that upon startup, NiFi sets all counters to 0 and as the system runs, these counters are monotonically increasing. They count how many times something has happened since NiFi started. To see these counters, we can go to the drop-down menu in the top-right corner of the NiFi UI and click "Counters."

46402-screen-shot-2017-12-21-at-33048-pm.png

We then are provided with a list of counters that have been registered:

46403-screen-shot-2017-12-21-at-33149-pm-1.png

There is a button on the right-hand side that allows us to reset a counter, but when given the number 4,172,721 we have no idea how long it took to process that many records. Has it taken 1 second or 1 month? We also don't know whether the records were added all at once or evenly over a long period of time.

In NiFi 1.4.0, though, we introduced a new feature that adds these counters to the Status History. So now, when we right-click on a processor and choose "View status history", the drop-down list of metrics that we can view now contains any counters that have been registered by the processor.

For example, the UpdateRecord processor that I have running on my laptop shows the following metrics:

46404-screen-shot-2017-12-21-at-35636-pm.png

So we can see that after starting, the number of Records Processed (per 5 minutes) quickly ramped up to about 58 million, then steadied out at a little over 50 million records per 5 minutes (or 167,000 records per second). Note here that the value that is charted is not the "total value" of the counter, as would be displayed in the Counters menu. Rather, this is charting how the counter changed over time, or how much the counter increased in a 5-minute window.

This feature gives us a nice way to see how our processor is performing over time. All of the record-oriented processors (such as PartitionRecord, UpdateRecord, SplitRecord, QueryRecord, etc.) emit these counters. This is important for those of us who are more interested in the number of Events per Second that we're processing than in the number of gigabytes per second. However, we can easily count anything we want. DetectDuplicate, for instance, counts the number of FlowFiles that it has routed to "duplicate" and the number of FlowFiles that it has routed to "non-duplicate" giving us a feel not just for the processor's performance but also for how the data is changing over time. We can answer questions such as "Are we seeing more duplicates now, or fewer?"

Hopefully this will help to give you some more insight into how your flow is working and how to emit your own custom metrics.

Cheers!

3,197 Views
Comments
avatar
Rising Star

Is there any way to increase this window from 5 minutes to lets say an hour or more? ( A common question would be how many records were processed during a 24 hour window etc.)