Prometheus+alertmanage配置邮件报警
架构图
1,官方下载编译安装
https://github.com/prometheus/alertmanager/releases/download/v0.23.0/alertmanager-0.23.0.linux-amd64.tar.gz
开机自启配置
root@ceph-teamplate:/apps# cat /lib/systemd/system/alertmanager.service [Unit] Description=alertmanager After=network.target [Service] ExecStart=/apps/alertmanager/alertmanager WorkingDirectory=/apps/alertmanager Restart=on-failure [Install] WantedBy=multi-user.target
2,进入解压后的alertmanager文件夹,修改alertmanager.yml文件,配置报警信息,alertmanager.yml 内容如下:
root@blackbox_exporter-189:/apps/alertmanager# cat alertmanager.yml
global:
resolve_timeout: 5m
smtp_from: '8147x@qq.com' #用于发送邮件的邮箱
smtp_smarthost: 'smtp.qq.com:465'
smtp_auth_username: '8147x@qq.com' #邮箱地址
smtp_auth_password: 'xxxx' #邮箱授权密码
smtp_require_tls: false
smtp_hello: 'qq.com'
route: #设置报警分发策略
group_by: ['alertname'] #分组标签
group_wait: 8s #告警等待时间,告警产生后等待8s,如果有相同告警一起发出
group_interval: 3s #两组告警间隔时间
repeat_interval: 2m #重复告警的间隔时间,减少相同邮件发送频率,此处测试设置为2分钟
receiver: 'email' #默认接收者
#routes: #指定哪些组可以接受消息
#- receiver: mail
receivers:
- name: 'email'
email_configs:
- to: '8147x@qq.com' 接受报警邮箱地址
send_resolved: true
#inhibit_rules:
# - source_match:
# severity: 'critical'
# target_match:
# severity: 'warning'
# equal: ['alertname', 'dev', 'instance']
检查alertmanager.yml 配置是否正确
root@blackbox_exporter-189:/apps/alertmanager# ./amtool check-config alertmanager.yml
Checking 'alertmanager.yml' SUCCESS
Found:
- global config
- route
- 0 inhibit rules
- 1 receivers
- 0 templates
3,浏览器访问: http://192.168.192.182:9090/rules:9093 (IP:9093)
4,进入Prometheus的安装目录下修改Prometheus配置,取消alertmanager有关注释
root@pro182:/apps/prometheus# cat prometheus.yml # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: - 192.168.192.189:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: - "/apps/prometheus/*.yaml" #配置告警规则 # - "first_rules.yml" # - "second_rules.yml"
编写告警规则文件rule.yml
(此处用于测试,设置为当内存占用高于1%时,就会告警)
root@pro182:/apps/prometheus# cat neicun.yaml groups: - name: mem-rule rules: - alert: "内存报警" expr: (node_memory_MemTotal_bytes - (node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes )) / node_memory_MemTotal_bytes * 100 > 1 for: 5s labels: severity: warning annotations: summary: "服务名:{{$labels.alertname}} 内存报警" description: "{{ $labels.alertname }} 内存资源利用率大于 1%" value: "{{ $value }}"
浏览器访问 http://192.168.192.189:9093/#/alerts ,也能看到告警信息
aa