Prometheus (4.2)

With the Prometheus agent, which is available for both real-time and batch workflow configurations, you can publish metrics in a format that can be scraped by a Prometheus instance. The metrics can then be visualized in a dashboard using Grafana.

See the official Grafana documentation: https://grafana.com/docs/grafana/latest/ and the official Prometheus documentation: https://prometheus.io/docs/introduction/overview/ for information about how configure Grafana and Prometheus.

Use the to configure the metrics that you want to expose for scraping.

The Prometheus forwarding agent stores metrics in a cache. The cached metrics are published in an endpoint that is scraped by Prometheus.

You can configure the maximum number of metrics stored until Prometheus scrapes them. When you scrape a metric it is automatically deleted. You can also set an expiration time for a metric.

Note!

The cache is currently under development and there is a known issue about dropping metrics even when the cache is not full. See more in the Endpoint section.

Endpoint

Each Execution Context exposes an endpoint that can be scraped by a Prometheus instance at:

<ec_host>:<ec_webport>/metrics/

You can configure the number of metrics to store and how long the cache should be kept with the following Execution Context properties:

Property

Description

Property

Description

mz.metrics.cache.size.max

Maximum number of records in the metrics cache. The default value is 10000.

mz.metrics.cache.expire

Maximum time in seconds before a metric is removed from the cache. The default value is 300.

The cache is shared by all workflows running in the Execution Context so the size has to be set to meet the expected metric flow throughput. Each Prometheus scrape empties the cache so the cache size should be set to a minimum of
<number_of_metrics_expected_per_second> * <prometheus_scrape_interval>.

Note!

Metrics stored in the cache are read-only-once. Manual querying of the endpoint will result in data missing in Prometheus.