Grafana 配置 Prometheus 告警


Email configuration

我们这里选择 email 告警,首先修改 grafana 的配置文件 /etc/grafana/grafana.ini,找到 SMTP 部分,修改为如下:

[smtp]
;enabled = false
enabled = true
;host = localhost:25
host = smtp.exmail.qq.com:25
;user =
user = notice@wzlinux.com
# If the password contains # or ; you have to wrap it with triple quotes. Ex """#password;"""
;password =
password = Q7P1hsdfsenzzyM
;cert_file =
;key_file =
;skip_verify = false
;from_address = admin@grafana.localhost
from_address = notice@wzlinux.com
from_name = Grafana
# EHLO identity in SMTP dialog (defaults to instance_name)
;ehlo_identity = dashboard.example.com
 

修改完成后,重启 grafana。

List of supported notifiers for details.
  • Default (send on all alerts) - When selected, this option sends a notification on this channel for all alert rules.
  • Include Image - See Enable images in notifications for details.
  • Disable Resolve Message - When selected, this option disables the resolve message [OK] that is sent when the alerting state returns to false.
  • Send reminders - When this option is checked additional notifications (reminders) will be sent for triggered alerts. You can specify how often reminders should be sent using number of seconds (s), minutes (m) or hours (h), for example 30s3m5m or 1h.
  • Create alerts

    grafana 不支持带有变量的模板报警,所以我们需要创建一个不带有变量的模板,这个我们可以去官方的 dashboard 中找一个,我这里找到编号为5984,大家可以安装这个模板。

    image-20200524211617663

    我在里面简单修改了一下格式,以及数据源,主要是针对 CPU 的负载修改了一下,其他资源默认可以使用。

    image-20200524211731861

    我这边已经测试过了,没有什么问题,报警都可以正常运行,我把 promSQL 贴出来

    CPU:

    100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
     

    legend 设置为{{instance}}

    内存:

    100*(node_memory_MemTotal_bytes - node_memory_MemFree_bytes - node_memory_Buffers_bytes - node_memory_Cached_bytes) / node_memory_MemTotal_bytes
       

    legend 设置为{{instance}}

    存储:

    100.0 - 100 * ((node_filesystem_avail_bytes / 1000 / 1000 ) / (node_filesystem_size_bytes  / 1024 / 1024))
     

    legend 设置为 {{instance}} - {{mountpoint}}

    案例

    以下拿 CPU 的配置截图给大家看下:

    Queries:

    image-20200525142217564

    Visualization:

    image-20200525142329210

    Alert:

    image-20200525142405887