[Kubernetes tip] Multi-Cluster Configurations with Prometheus
Last updated
This tip is for those who are using Prometheus federation to monitor multiple clusters.
How should alertmanager be configured for multiple clusters? Let us say that if there's an issue for Cluster A it only needs to send an alert for cluster A?
alerting_rules.yml:
groups:
- name: Instances
rules:
- alert: TEST ALERT FROM PROMETHEUS PLEASE ACKNOWLEDGE
expr: prometheus_build_info{instance="localhost:9090"} == 1
for: 10s
labels:
severity: page
annotations:
description: '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes.'
summary: 'Instance {{ $labels.instance }} down'
action: TESTING PLEASE ACKNOWLEDGE, NO FURTHER ACTION REQUIRED ONLY A TEST
![]({{ 'assets/images/prometheus-alert-1.png' | relative_url }})
In such cases, every alert should be routed to proper team based on labels (if there is problem with application A on cluster B - team responsible should be notified). In the above case, two alerts are triggered by the same rule. You'll have to deduplicate them. Now, if you don't wish to be alerted on each trigger of very smiliar alertsyou can treat them as a group.
If you know some app in node A have disk issues, and all other apps on that node have the same issue (the same cause) you might not want to recieve 10 alerts, but you'd rather just want to be informed of one if the conditions are met(like they were triggered by similar rules/in similar place and withing given time interval).
Do read up on the AlertManager docs for more infomation on alert grouping.
Looking for an end-to-end incident alerting, on-call scheduling and response orchestration platform?
Sign up for a 14-day free trial of Zenduty. No CC required. Implement modern incident response and SRE best practices within your production operations and provide industry-leading SLAs to your customers
Deepak Kumar
Hybrid Cloud Engineering.