Support Questions

Find answers, ask questions, and share your expertise

SiteToSiteStatusReportingTask aggregate statuses from multiple nodes

Rising Star


I would like to monitor many clusters' status data. I have gone with the method of adding a SiteToSiteStatusReportingTask to each of my clusters and sending all the statuses to a single remote input port in a dedicated cluster. The issue I've encountered is that I couldn't find a reasonable way to aggregate statuses from different nodes to get a clearer picture of a component's status.


For example, in a 4 node cluster I've got a process group with 250 files queued on each node, 1000 in total. The reporting task on this cluster would send 4 separate flowfiles to my dedicated cluster, each describing the 250 metric. All of these flowfiles have attributes describing their origin reporting task, however (perhaps for valid reasons) there is no unique identifier describing that all 4 of these flowfiles originated from a specific run of the reporting task. As such, I cannot aggregate these 4 files with a sum and get a more accurate number similar to what I can see in the UI when looking at the aforementioned process group. Since my clusters have varying numbers of nodes I cannot attempt to merge 4 records specifically and I do not want to create duplicate flows for differing node counts. 


In comparison, when using the REST API to GET a process group, there is a field named "aggregate Snapshot" which accurately calculates the sum of all nodes - this is what I am interested in.


Overall I would like to use the reporting tasks so I remain in a data-push method rather than data-pull by sampling all my clusters' rest API.


If anyone has any recommendations for how to do this they would be very much appreciated.


Thanks in advance,