System Insight (SI)

System Insight provides comprehensive metrics for and visualization of both MediationZone as a system, and data flowing through or collected by MediationZone.

Purpose

System Insight has been designed with multiple purposes in mind:

  • Data flow visualization - understanding and analyzing the data flowing through MediationZone
  • System operation visualization - understanding how MediationZone is doing
  • Troubleshooting - facilitating correlation of events, load levels, sudden changes, etc
  • Trend analysis - easily viewing how usage changes over time and planning ahead with confidence
  • Alerting - quickly becoming aware of values crossing a threshold or leaving a safe interval

System Insight is not built to store every single UDR processed by MediationZone. It provides aggregated or sampled metrics, not exact values.

Dashboard Builder

System Insight comes with a completely customizable web dashboard builder with customizable access rights.

Metrics (time-series data) can be visualized using graphs, tables, pie charts, single-stat panels, maps and more.

System Insight dashboard

Default template dashboards for the MediationZone systems are included at installation. These can be extended and customized.

Dashboards can be updated automatically, and can also be used to view historical metrics back in time.

Architecture

A "Probe" is a point in MediationZone or a workflow where metrics are created by sampling or aggregating events.

These are then sent to the System Insight service, from which they are stored in a time-series database, based on the open-source InfluxDB project.

A web-based dashboard builder, based on the open-source Grafana project, can be used to design customized dashboards for any use case, based on the stored data. 

System Insight can collect metrics from external probes in order to visualize them, and can also forward metrics to an external analytics or visualization tool, using MZ workflows.

Note that System Insight is only available on Linux, not other operating systems.

Probe Types

System Insight has three basic types of probes:

  • Internal probes: These are system probes, built into MediationZone and are always available, e.g. CPU, JVM usage, network and storage metrics.
  • Data Flow probes:
    • MIM metrics: These are available in a separate package, and enable creation of metrics by sampling or aggregating MIM values, both built-in MIMs and user-defined MIMs. Examples are queue lengths, throughput, batch count, thread pool information, etc.
    • Custom metrics: These are created using the System Insight Forwarding agent, and enable creation of metrics from any workflow by sending System Insight UDRs to the agent. Examples are usage per product category, SLA compliance, error counts per error code, etc.
  • External probes:
    • These represent metrics collected by MediationZone, which are then handled either as MIM metrics or Custom metrics. Examples are network inventory metrics from probes, cloud infrastructure usage, number of alarms triggered by monitored systems, etc.

System Insight probes

Metrics

A metric sample consists of:

  • Metric name
  • Timestamp
  • A map of tags=values
  • A map of fields=values

Samples are either aggregated (the average or sum over a time period) or gauges (the current value at the end of a time period).

System Insight does not provide drill-down capabilities into individual UDRs or values that samples have been generated from.

Metrics Filters

By default, only System probes generate metrics. Pass-through filters can be defined to enable other probe types, by metric name and metric-specific tags. Using this filtering mechanism, only relevant metrics are stored and visualized.

Filters are managed using the System Insight Filter tool and/or the mzsh system insight command.

Metrics Repository

MediationZone can generate a lot of different metrics. In order to see which metrics and tags are available, there is a metric repository where all probes that can emit metrics are listed, whether they are currently filtered out or not. This is helpful when you want to create new filters.

Agents

System Insight contains two agents:

  • System Insight Forwarding: Used to generate metrics from any data available in a workflow, and send it to System Insight.
  • System Insight Collection: Used to stream System Insight metrics into a workflow, from where it can be forwarded to an external system or otherwise processed.

Metrics Retention and Downsampling

By default, metrics are sampled every 10 seconds and stored in this resolution for 1 week. After that, they are downsampled to one sample per 10 minutes, and stored for 6 months. These retention policy and downsampling settings are configurable.


Next chapter: