9.60 Prometheus Agent

This section describes the Prometheus forwarding agent. The Prometheus agent is available for both real-time and batch workflow configurations. This agent lets you publish metrics in a format that can be scraped by a Prometheus instance. The metrics can then be visualized using Grafana. Follow the official Grafana documentation here: https://grafana.com/docs/grafana/latest/. To configure Prometheus, see https://prometheus.io/docs/introduction/overview/.

Use the Prometheus filter to configure the metrics that are going to be exposed for scraping, see 9.60.3 The Prometheus Filter.

The Prometheus forwarding agent stores metrics in a cache. The cached metrics are published in an endpoint which is scraped by Prometheus. 

You can configure a maximum number of metrics stored until Prometheus scrapes them. When you scrape a metric it is automatically deleted. You can also set an expiration time for a metric.

Note

Currently, the cache is under development. There is a known issue about dropping metrics even when the cache is not full. See more in the Endpoint section.

Endpoint

Each Execution Context exposes an endpoint that can be scraped by a Prometheus instance at:

<ec_host>:<ec_webport>/metrics

It is possible to configure how many metrics to store and for how long a cache can hold. This is configured via Execution Context Properties:

Property

Description

mz.metrics.cache.size.maxMaximum number of records in the metrics cache. The default value is 10000.
mz.metrics.cache.expireMaximum time in seconds before a metric is removed from the cache. The default value is 300.

The cache is shared by all workflows running in the Execution Context so its size has to be set accordingly, that is, to expected metric flow throughput. Each Prometheus scrape empties the cache so the cache size should be set to minimum 
<number_of_metrics_expected_per_second> * <prometheus_scrape_interval>.

Note

Metrics stored in the cache are read-only-once. Manual querying of the endpoint will result in data missing in Prometheus.

Compatibility with System Insight

It is possible to replace the System Insight agent with the Prometheus agent without making any changes to the existing workflows. This is done by simply replacing the System Insight Forwarding agent with the Prometheus Forwarding Agent.


A few restrictions apply:

  • All metrics created with the use of a Measurement UDR are exposed as a GAUGE type.

  • Each value stored inside the fields field of the Measurement UDR are reported as a separate metric.

  • The CATEGORY  field is not used. If it is still needed it has to be assigned to NAME fields of the Prometheus UDR.

  • If the System Insight UDR field fields contains many measurements, each is reported as a separate metric.