Prometheus+alertmanage配置邮件报警


 架构图

1,官方下载编译安装

 https://github.com/prometheus/alertmanager/releases/download/v0.23.0/alertmanager-0.23.0.linux-amd64.tar.gz

开机自启配置

root@ceph-teamplate:/apps# cat /lib/systemd/system/alertmanager.service
[Unit]
Description=alertmanager
After=network.target

[Service]
ExecStart=/apps/alertmanager/alertmanager
WorkingDirectory=/apps/alertmanager
Restart=on-failure

[Install]
WantedBy=multi-user.target

2,进入解压后的alertmanager文件夹,修改alertmanager.yml文件,配置报警信息,alertmanager.yml 内容如下:

root@blackbox_exporter-189:/apps/alertmanager# cat alertmanager.yml 
global:
  resolve_timeout: 5m
  smtp_from: '8147x@qq.com'   #用于发送邮件的邮箱
  smtp_smarthost: 'smtp.qq.com:465'
  smtp_auth_username: '8147x@qq.com'   #邮箱地址
  smtp_auth_password: 'xxxx'    #邮箱授权密码
  smtp_require_tls: false
  smtp_hello: 'qq.com'
route:   #设置报警分发策略
  group_by: ['alertname']  #分组标签
  group_wait: 8s    #告警等待时间,告警产生后等待8s,如果有相同告警一起发出
  group_interval: 3s  #两组告警间隔时间
  repeat_interval: 2m  #重复告警的间隔时间,减少相同邮件发送频率,此处测试设置为2分钟
  receiver: 'email'    #默认接收者
#routes: #指定哪些组可以接受消息
#- receiver: mail receivers: - name: 'email' email_configs: - to: '8147x@qq.com' 接受报警邮箱地址 send_resolved: true #inhibit_rules: # - source_match: # severity: 'critical' # target_match: # severity: 'warning' # equal: ['alertname', 'dev', 'instance']

检查alertmanager.yml 配置是否正确

root@blackbox_exporter-189:/apps/alertmanager# ./amtool check-config alertmanager.yml
Checking 'alertmanager.yml'  SUCCESS
Found:
 - global config
 - route
 - 0 inhibit rules
 - 1 receivers
 - 0 templates

3,浏览器访问: http://192.168.192.182:9090/rules:9093  (IP:9093)

 4,进入Prometheus的安装目录下修改Prometheus配置,取消alertmanager有关注释

root@pro182:/apps/prometheus# cat prometheus.yml 
# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - 192.168.192.189:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "/apps/prometheus/*.yaml"  #配置告警规则
  # - "first_rules.yml"
  # - "second_rules.yml"

编写告警规则文件rule.yml

(此处用于测试,设置为当内存占用高于1%时,就会告警)

root@pro182:/apps/prometheus# cat neicun.yaml 
groups:
- name: mem-rule
  rules:
  - alert: "内存报警"
    expr: (node_memory_MemTotal_bytes - (node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes )) / node_memory_MemTotal_bytes * 100 > 1
    for: 5s
    labels:
      severity: warning
    annotations:
      summary: "服务名:{{$labels.alertname}} 内存报警"
      description: "{{ $labels.alertname }} 内存资源利用率大于 1%"
      value: "{{ $value }}"

 浏览器访问 http://192.168.192.189:9093/#/alerts  ,也能看到告警信息

 

aa