Created on 11-27-201908:58 AM - edited 09-16-202201:45 AM
Traditional Prometheus Architecture
(Image Courtesy: https://prometheus.io/docs/introduction/overview/) Prometheus is great. It has a huge number of integrations and provides a great metric monitoring platform, especially when working with Kubernetes. However, it does have a few shortcomings. The aim of this alternative architecture is to preserve the best parts of Prometheus while augmenting its weaker points with more powerful technologies.
The service discovery and metric scraping framework in Prometheus is its greatest strength, but it is greatly limited by its tight coupling to the TSDB system inside Prometheus. While it is possible to replace the TSDB inside Prometheus with an external database, the data retrieval process only supports writing into this one database.
The greatest strength of Prometheus, the service discovery and metric scraping framework, can now be used within Apache NiFi with the introduction of the GetPrometheusMetrics processor. This processor uses cgo and JNI to leverage the actual Prometheus libraries for service discovery and metric scraping. The standard Prometheus YML configurations are provided to the processor and JSON data is output as it scrapes metrics from configured and/or discovered endpoints. When combined with NiFi’s HTTP listener processors, the entire data ingestion portion of Prometheus can be embedded within NiFi.
The advantage of NiFi for data ingestion is that it comes with a rich set of processors for transforming, filtering, routing, and publishing data, potentially to many different places. The ability to load data into the data store (or data stores) of choice increases extensibility and enables more advanced analytics.
One good option for the datastore is Apache Druid. Druid was built for both real-time and historical analytics at scale (ingest of millions of events per second plus petabytes of history). It is supported by many dashboarding tools natively (such as Grafana or Superset), and it supports SQL through JDBC, making it accessible from a wide array of tools (such as Tableau). Druid addresses the scalability issues of the built-in TSDB while still providing a similar user experience and increasing extensibility to more user interfaces.
The option of sending scraped data to many locations provides an easy way to integrate with other monitoring frameworks, or to perform advanced analytics and machine learning. For example, loading metrics into Kafka makes it accessible in real-time to stream processing engines (like Apache Flink), function as a service engines (like OpenWhisk), and custom microservices.
With this architecture it is now possible to apply ML to Prometheus-scaped metrics in real-time and to activate functions when anomalies are found.