Integrating Grafana with Prometheus and Alertmanager

Grafana, in conjunction with Prometheus and Alertmanager, is a commonly used solution for monitoring Kubernetes clusters. The stack is universally applicable and can be used both in cloud and bare-metal clusters. It is functional, easily integrable and free, which accounts for its popularity.

In this article I will show how to integrate Grafana with Alertmanager, manage silences by means of Grafana, configure Alertmanager to inhibit alerts, and keep this configuration in code for future cases. Following the steps described below you will learn how to:

add Alertmanager data source to Grafana by code
configure Alertmanager to visualize alerts properly
suppress some alerts via Alertmanager configuration

Requirements

You will need a Kubernetes cluster with installed `kube-prometheus-stack` Helm chart (version 39.5.0). You can use your existing cluster or deploy a testing environment. For an example, see our article Deploying Prometheus Monitoring Stack with Cluster.dev.

Introduction

Starting from v.8.0, Grafana is shipped with an integrated alerting system for acting on metrics and logs from a variety of external sources. At the same time, Grafana is compatible with Alertmanager and Prometheus by default – a combination that most of the industry community benefits from when monitoring Kubernetes clusters.

One of the reasons why we prefer using Alertmanager over a native Grafana alerting is because it is easier to automate when our configuration is in code. For example, while you can define in code Grafana-managed visualization panels to have them reused afterward, it will be much harder to manage. Alertmanager also comes together with Prometheus in the `kube-prometheus-stack` Helm chart – a resource we use to monitor Kubernetes clusters.

Grafana integration with Alertmanager

The first thing we do is configure Grafana integration with Alertmanager.
In order to make it automatic, add the following code to `kube-prometheus-stack` values:


1
2
3
4
5
6
7
8
9
10
grafana:

  additionalDataSources:

   - name: Alertmanager

     type: alertmanager

     url: <a class="c-link" tabindex="-1" href="http://monitoring-kube-prometheus-alertmanager:9093/" target="_blank" rel="noopener noreferrer" data-stringify-link="http://monitoring-kube-prometheus-alertmanager:9093" data-sk="tooltip_parent" data-remove-tab-index="true">http://monitoring-kube-prometheus-alertmanager:9093</a>

     editable: true

     access: proxy

     version: 2

     jsonData:

       implementation: prometheus

Customize the value of the `url:` key if it is different in your case. Deploy the code to your cluster and check it in Grafana data sources.

Grafana integration with Alertmanager — Grafana data sources — SHALB — Image

Then check active alerts – you should see at least one default alert.

Grafana integration with Alertmanager — Grafana alert groups — SHALB — Image

Add Alertmanager configuration

Sometimes you can’t avoid alerts duplication during current integration, but I believe that in most cases it is possible. To see alerts without duplication you need to configure Alertmanager properly. This means having one receiver per alert.

In our case, to keep things simple we will add two receivers:

`blackhole` – for alerts with zero priority and no need to be sent
`default` – for alerts with severity level: info, warning, critical

The `default` receiver should have all needed notification channels. In our case we have two example channels – `telegram` and `slack`.

To automate the setup of Alertmanager configuration, add the following code to the `kube-prometheus-stack` yaml file:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
alertmanager:

  config:

    global:

      resolve_timeout: 5m

    route:

      group_by: [...]

      group_wait: 9s

      group_interval: 9s

      repeat_interval: 120h

      receiver: blackhole

      routes:

        - receiver: default

          group_by: [...]

          match_re:

            severity: "info|warning|critical"

          continue: false

          repeat_interval: 120h

    receivers:

      - name: blackhole

      - name: default

        telegram_configs:

          - chat_id: -000000000

            bot_token: 0000000000:00000000000000000000000000000000000

            message: |

              'Status: &lt;a href="<a class="c-link" href="https://127.0.0.1/" target="_blank" rel="noopener noreferrer" data-stringify-link="https://127.0.0.1" data-sk="tooltip_parent">https://127.0.0.1</a>"&gt;{{ .Status }}&lt;/a&gt;'

              '{{ .CommonAnnotations.message }}'

            api_url: <a class="c-link" href="https://127.0.0.1/" target="_blank" rel="noopener noreferrer" data-stringify-link="https://127.0.0.1" data-sk="tooltip_parent">https://127.0.0.1</a>

            parse_mode: HTML

            send_resolved: true

        slack_configs:

          - api_url: <a class="c-link" href="https://127.0.0.1/services/00000000000/00000000000/000000000000000000000000" target="_blank" rel="noopener noreferrer" data-stringify-link="https://127.0.0.1/services/00000000000/00000000000/000000000000000000000000" data-sk="tooltip_parent">https://127.0.0.1/services/00000000000/00000000000/000000000000000000000000</a>

            username: alertmanager

            title: "Status: {{ .Status }}"

            text: "{{ .CommonAnnotations.message }}"

            title_link: "<a class="c-link" href="https://127.0.0.1/" target="_blank" rel="noopener noreferrer" data-stringify-link="https://127.0.0.1" data-sk="tooltip_parent">https://127.0.0.1</a>"

            send_resolved: true

Deploy the code to your cluster and check for active alerts – they should not be duplicated.

Grafana integration with Alertmanager — Grafana alert groups — SHALB — Image №2

Add example inhibition rules

In some cases we want to disable alerts via silences and sometimes it is better to do it in code. Silence is good as a temporary measure. It is, however, impermanent and has to be recreated again if you deploy to an empty cluster. Disabling alerts via code, on the other hand, is a sustainable solution that can be used for repeated deployments.

Disabling alerts via silence is simple – just open the Silences tab and create one with a desired duration, for example `99999d`. If you have persistent storage enabled for Alertmanager such silence is permanent.

Grafana integration with Alertmanager — Disable alerts via silences in Grafana — SHALB — Image

This section refers mostly to a second case, because adding silence as code is not an easy task. We will disable two test alerts by the `Watchdog` alert, which is always firing by default.

Add this code to `kube-prometheus-stack` yaml file:


1
2
3
4
5
    inhibit_rules:

      - target_matchers:

          - alertname =~ "ExampleTwoAlertToInhibit|ExampleOneAlertToInhibit"

        source_matchers:

          - alertname = Watchdog

The resulting code should look like this:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
alertmanager:

  config:

    global:

      resolve_timeout: 5m

    route:

      group_by: [...]

      group_wait: 9s

      group_interval: 9s

      repeat_interval: 120h

      receiver: blackhole

      routes:

        - receiver: default

          group_by: [...]

          match_re:

            severity: "info|warning|critical"

          continue: false

          repeat_interval: 120h

    inhibit_rules:

      - target_matchers:

          - alertname =~ "ExampleAlertToInhibitOne|ExampleAlertToInhibitTwo"

        source_matchers:

          - alertname = Watchdog

    receivers:

      - name: blackhole

      - name: default

        telegram_configs:

          - chat_id: -000000000

            bot_token: 0000000000:00000000000000000000000000000000000

            message: |

              'Status: &lt;a href="<a class="c-link" tabindex="-1" href="https://127.0.0.1/" target="_blank" rel="noopener noreferrer" data-stringify-link="https://127.0.0.1" data-sk="tooltip_parent" data-remove-tab-index="true">https://127.0.0.1</a>"&gt;{{ .Status }}&lt;/a&gt;'

              '{{ .CommonAnnotations.message }}'

            api_url: <a class="c-link" tabindex="-1" href="https://127.0.0.1/" target="_blank" rel="noopener noreferrer" data-stringify-link="https://127.0.0.1" data-sk="tooltip_parent" data-remove-tab-index="true">https://127.0.0.1</a>

            parse_mode: HTML

            send_resolved: true

        slack_configs:

          - api_url: <a class="c-link" tabindex="-1" href="https://127.0.0.1/services/00000000000/00000000000/000000000000000000000000" target="_blank" rel="noopener noreferrer" data-stringify-link="https://127.0.0.1/services/00000000000/00000000000/000000000000000000000000" data-sk="tooltip_parent" data-remove-tab-index="true">https://127.0.0.1/services/00000000000/00000000000/000000000000000000000000</a>

            username: alertmanager

            title: "Status: {{ .Status }}"

            text: "{{ .CommonAnnotations.message }}"

            title_link: "<a class="c-link" tabindex="-1" href="https://127.0.0.1/" target="_blank" rel="noopener noreferrer" data-stringify-link="https://127.0.0.1" data-sk="tooltip_parent" data-remove-tab-index="true">https://127.0.0.1</a>"

            send_resolved: true

Deploy the code to your cluster. Add test alerts with the following code:


1
2
3
4
5
6
7
8
9
10
11
12
13
apiVersion: monitoring.coreos.com/v1

kind: PrometheusRule

metadata:

  name: test-rules

  namespace: monitoring

spec:

  groups:

    - name: "test alerts"

      rules:

        - alert: ExampleAlertToInhibitOne

          expr: vector(1)

        - alert: ExampleAlertToInhibitTwo

          expr: vector(1)

Deploy the code with test alerts to your cluster, check the existence of our test rules in the rules list. Wait for 1-3 minutes to see the test alerts; those alerts should be suppressed.

Grafana integration with Alertmanager — Grafana alert groups — SHALB — Image №3

Conclusion

In this article we have reviewed a generic case of integrating Grafana with Alertmanager, learnt how to manage silences in Grafana, and inhibit alerts via Alertmanager in code. Now you will be able to manage your alerts in an easy and reproducible way with minimal code. Basic code examples are ready to be used in your projects and can be applicable to any configuration.

Integrating Grafana with Prometheus and Alertmanager

Requirements

Introduction

Grafana integration with Alertmanager

Add Alertmanager configuration

Add example inhibition rules

Conclusion

Get Results

Order Tariff Plan - Basic

Order Tariff Plan - Pro

Order Tariff Plan - Plus

Thank You For Your Request

Integrating Grafana with Prometheus and Alertmanager

Requirements

Introduction

Grafana integration with Alertmanager

Add Alertmanager configuration

Add example inhibition rules

Conclusion

Contact us to build the right product with the right team

Get Results

Order Tariff Plan - Basic

Order Tariff Plan - Pro

Order Tariff Plan - Plus

Thank You For Your Request