导读

目前学校在申请二级等保认证,这其中就有一项日志记录,而学校因为历史原因,所使用的业务很繁杂,系统也多样化,我来了后虽然改了一部分,但依然因为费用的原因,导致没有专门的日志系统,这也就导致在认证时采集日志的工作变得极为复杂。

于是乎,为了解决这一问题,心里就在此浮现了搭建专用的日志采集系统的想法,正好之前有一台服务器被我改成了 Proxmox,部分小业务一直跑在上面,于是这次就在这台服务器上开个虚拟机,专门用来采集日志。

环境介绍

本次搭建环境如下:

序号 IP地址 操作系统 安装软件 | 用途
1 192.168.10.1 Alpine 全套
2 192.168.10.2 Ubuntu Promtail + Node Exporter

主服务器搭建

系统准备

本着资源最小化的原则,直接采用了 Alpine 操作系统,只有几兆的大小,资源占用那叫一个小啊。

同时,将几个软件整合在整个系统中,使用 Grafana 作为面板查看各项数据,Loki 作为日志收集系统,Prometheus 主机监控,AlertManager 用于系统告警,Prometheus 的第三方钉钉插件用于对接钉钉报警。

系统正常开启并更新后,执行命令安装必要组件:

apk add curl wget libc6-compat

创建必要的数据存储目录:

mkdir -p /data/grafana
mkdir -p /data/loki/chunks
mkdir -p /data/loki/rules
mkdir -p /data/prometheus
mkdir -p /data/alertmanager
mkdir -p /data/dingtalk
mkdir -p /data/promtail
mkdir -p /opt/loki
mkdir -p /opt/promtail

创建各服务所需的用户,这里不为 Promtail 创建用户,是因为 Promtail 在提交日志时,需要提交一些系统日志,使用独立用户会导致因权限问题无法提交:

addgroup -S grafana
addgroup -S loki
addgroup -S prometheus
addgroup -S alertmanager
addgroup -S dingtalk
adduser -S -D -H -G grafana grafana
adduser -S -D -H -G loki loki
adduser -S -D -H -G prometheus prometheus
adduser -S -D -H -G alertmanager alertmanager
adduser -S -D -H -G dingtalk dingtalk

下载所需各项软件:

cd /tmp
wget -c https://dl.grafana.com/enterprise/release/grafana-enterprise-11.4.0.linux-amd64.tar.gz
wget -c https://github.com/grafana/loki/releases/download/v3.3.1/loki-linux-amd64.zip
wget -c https://github.com/grafana/loki/releases/download/v3.3.1/promtail-linux-amd64.zip
wget -c https://github.com/prometheus/prometheus/releases/download/v2.53.3/prometheus-2.53.3.linux-amd64.tar.gz
wget -c https://github.com/prometheus/alertmanager/releases/download/v0.27.0/alertmanager-0.27.0.linux-amd64.tar.gz
wget -c https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v2.1.0/prometheus-webhook-dingtalk-2.1.0.linux-amd64.tar.gz

Grafana 搭建

tar zxf grafana-enterprise-11.4.0.linux-amd64.tar.gz
mv grafana-v11.4.0 /opt/grafana

cat > /opt/grafana/conf/config.ini << EOF
[paths]
data = /data/grafana/data
logs = /data/grafana/log
plugins = /opt/grafana/plugins
provisioning = /opt/grafana/conf/provisioning
EOF

cat > /etc/init.d/grafana << EOF
#!/sbin/openrc-run

name="grafana"
description="Grafana Server"

command="/opt/grafana/bin/grafana server -- --config /opt/grafana/conf/config.ini"
pidfile=/data/grafana/run.pid
logfile=/data/grafana/$name.log
workdir=/opt/grafana
user=grafana

depend() {
  need net localmount
  after firewall
}

start_pre() {
  ebegin "Preparing to start $name"
  eend $?
}

start() {
  ebegin "Starting $name"
  start-stop-daemon --start --quiet --background --name grafana --make-pidfile --pidfile $pidfile --stdout $logfile --chdir $workdir --user $user --exec $command
  eend $?
}

stop() {
  ebegin "Stopping $name"
  start-stop-daemon --stop --quiet --pidfile $pidfile
  if [ -e $pidfile ]
    then rm $pidfile
  fi
  eend $?
}

restart() {
  ebegin "Restarting $name"
  start-stop-daemon --stop --quiet --pidfile $pidfile
  sleep 1
  start-stop-daemon --start --quiet --background --name grafana --make-pidfile --pidfile $pidfile --stdout $logfile --chdir $workdir --user $user --exec $command
  eend $?
}
EOF

chown -R grafana:grafana /opt/grafana
chown -R grafana:grafana /data/grafana
chmod +x /etc/init.d/grafana
rc-update add grafana default

Loki 搭建

cd /tmp
unzip loki-linux-amd64.zip
mv loki-linux-amd64 /opt/loki/loki

cat > /opt/loki/config.yaml << EOF
auth_enabled: false

server:
  http_listen_port: 3100

common:
  instance_addr: 0.0.0.0
  path_prefix: /opt/loki
  storage:
    filesystem:
      chunks_directory: /data/loki/chunks
      rules_directory: /data/loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

schema_config:
  configs:
    - from: 2020-10-24
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

ruler:
  alertmanager_url: http://localhost:9093
EOF

cat > /etc/init.d/loki << EOF
#!/sbin/openrc-run

name="loki"
description="Grafana Loki"

command="/opt/loki/loki -- -config.file /opt/loki/config.yaml"
pidfile=/data/loki/run.pid
logfile=/data/loki/$name.log
user=loki

depend() {
  need net localmount
  after firewall
}

start_pre() {
  ebegin "Preparing to start $name"
  eend $?
}

start() {
  ebegin "Starting $name"
  start-stop-daemon --start --quiet --background --name loki --make-pidfile --pidfile $pidfile --stderr $logfile --user $user --exec $command
  eend $?
}

stop() {
  ebegin "Stopping $name"
  start-stop-daemon --stop --quiet --pidfile $pidfile
  if [ -e $pidfile ]
    then rm $pidfile
  fi
  eend $?
}

restart() {
  ebegin "Restarting $name"
  start-stop-daemon --stop --quiet --pidfile $pidfile
  sleep 1
  start-stop-daemon --start --quiet --background --name loki --make-pidfile --pidfile $pidfile --stderr $logfile --user $user --exec $command
  eend $?
}
EOF

chown -R loki:loki /opt/loki
chown -R loki:loki /data/loki
chmod +x /etc/init.d/loki
rc-update add loki default

Prometheus 搭建

cd /tmp
tar zxf prometheus-2.53.3.linux-amd64.tar.gz
mv prometheus-2.53.3.linux-amd64 /opt/prometheus

cat > /opt/prometheus/config.yml << EOF
global:
  scrape_interval: 15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - localhost:9093

rule_files:
  - "alert_rules.yml"

scrape_configs:
  - job_name: "EKS"
    static_configs:
      - targets: ["localhost:9090"]
        labels:
          instance: EKS

  - job_name: "JW"
    static_configs:
      - targets: ['192.168.10.2:9104']
        labels:
          instance: JW
EOF

cat > /opt/prometheus/alert_rules.yml << EOF
groups:
  - name: alert_rules
    rules:
      - alert: CPU使用告警
        expr: sum(avg(irate(node_cpu_seconds_total{mode!='idle'}[5m])) without (cpu)) by (instance) > 0.60
        for: 2m
        labels:
          level: warning
        annotations:
          summary: "主机 {{ $labels.instance }} CPU 使用过高"
          description: "{{ $labels.instance }} CPU 使用率超过 60% (current value: {{ $value }})"
      - alert: CPU使用严重告警
        expr: (100 - (avg by (instance) (irate(node_cpu_seconds_total{job=~".*",mode="idle"}[5m])) * 100)) > 85
        for: 3m
        labels:
          level: serious
        annotations:
          summary: "主机 {{ $labels.instance }} CPU 使用高"
          description: "{{ $labels.instance }} CPU 使用率超过 85% (current value: {{ $value }})"
      - alert: 内存使用告警
        expr: avg by(instance) ((1 - (node_memory_MemFree_bytes + node_memory_Buffers_bytes + node_memory_Cached_bytes) / node_memory_MemTotal_bytes) * 100) > 70
        for: 2m
        labels:
          level: warning
        annotations:
          summary: "主机 {{ $labels.instance }} 内存使用过高"
          description: "{{$labels.instance}}: 内存使用率超过 70% (current value is: {{ $value }})"
      - alert: 内存使用严重告警
        expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)/node_memory_MemTotal_bytes > 0.90
        for: 3m
        labels:
          level: serious
        annotations:
          summary: "主机 {{ $labels.instance }} 内存使用高"
          description: "{{ $labels.instance }} 内存使用率超过 90% (current value: {{ $value }})"
      - alert: 硬盘使用告警
        expr: (1 - node_filesystem_free_bytes{fstype!="rootfs",mountpoint!="",mountpoint!~"/(run|var|sys|dev).*"} / node_filesystem_size_bytes) * 100 > 80
        for: 2m
        labels:
          level: warning
        annotations:
          summary: "主机 {{ $labels.instance }} 硬盘使用过高"
          description: "{{$labels.instance}}: 硬盘使用率超过 80% (current value is: {{ $value }})"
      - alert: 硬盘使用严重告警
        expr: (1 - node_filesystem_free_bytes{fstype!="rootfs",mountpoint!="",mountpoint!~"/(run|var|sys|dev).*"} / node_filesystem_size_bytes) * 100 > 90
        for: 3m
        labels:
          level: serious
        annotations:
          summary: "主机 {{ $labels.instance }} 硬盘使用高"
          description: "{{$labels.instance}}: 硬盘使用率超过 90% (current value is: {{ $value }})"
      - alert: 节点文件描述符告警
        expr: avg by (instance) (node_filefd_allocated{} / node_filefd_maximum{}) * 100 > 60
        for: 2m
        labels:
          level: warning
        annotations:
          summary: "主机 {{ $labels.instance }} 文件描述符使用高"
          description: "{{$labels.instance}}: 文件描述符使用超过 60% (current value is: {{ $value }})"
      - alert: 节点平均负载告警
        expr: avg by (instance) (node_load15{}) > 80
        for: 2m
        labels:
          level: warning
        annotations:
          summary: "主机 {{ $labels.instance }} 15 分钟平均负载高"
          description: "{{$labels.instance}}: 15 分钟平均负载超过 80 (current value is: {{ $value }})"
      - alert: 节点离线告警
        expr: avg by (instance) (up{}) == 0
        for: 2m
        labels:
          level: warning
        annotations:
          summary: "主机 {{$labels.instance}} 当前离线"
          description: "{{$labels.instance}}: Node_Exporter 代理已断开 (current value is: {{ $value }})"
      - alert: 节点进程阻塞告警
        expr: avg by (instance) (node_procs_blocked{}) > 10
        for: 2m
        labels:
          level: warning
        annotations:
          summary: "主机 {{ $labels.instance }}  进程阻塞过高"
          description: "{{$labels.instance}}: 检测到进程阻塞超过 10 (current value is: {{ $value }})"
      - alert: 网络上传告警
        expr:  avg by (instance) (floor(irate(node_network_transmit_bytes_total{}[2m]) / 1024 / 1024 * 8 )) > 40
        for: 1m
        labels:
          level: warning
        annotations:
          summary: "主机 {{ $labels.instance }} 网络上传过高"
          description: "{{$labels.instance}}: 节点上传速率超过 40Mbps/s (current value is: {{ $value }}Mbps/s)"
      - alert: 网络下载告警
        expr:  avg by (instance) (floor(irate(node_network_receive_bytes_total{}[2m]) / 1024 / 1024 * 8 )) > 40
        for: 1m
        labels:
          level: warning
        annotations:
          summary: "Instance {{ $labels.instance }} 网络下载过高"
          description: "{{$labels.instance}}: 节点下载速率超过 40Mbps/s (current value is: {{ $value }}Mbps/s)"
      - alert: 磁盘读取告警
        expr: avg by (instance) (floor(irate(node_disk_read_bytes_total{}[2m]) / 1024 )) > 5000
        for: 10m
        labels:
          level: warning
        annotations:
          summary: "主机 {{ $labels.instance }} 磁盘读取速率过高"
          description: "{{$labels.instance}}: 节点磁盘读取速率超过 5000KB/s (current value is: {{ $value }}KB/s)"
      - alert: 磁盘写入告警
        expr: avg by (instance) (floor(irate(node_disk_written_bytes_total{}[2m]) / 1024 / 1024 )) > 200
        for: 2m
        labels:
          level: warning
        annotations:
          summary: "主机 {{ $labels.instance }} 磁盘写入速率过高"
          description: "{{$labels.instance}}: 节点磁盘写入速率超过 20MB/s (current value is: {{ $value }}MB/s)"
EOF

cat > /etc/init.d/prometheus << EOF
#!/sbin/openrc-run

name="prometheus"
description="Prometheus Server"

command="/opt/prometheus/prometheus -- --config.file=/opt/prometheus/config.yml --web.enable-remote-write-receiver --storage.tsdb.path=/opt/prometheus/data"
pidfile=/data/prometheus/run.pid
logfile=/data/prometheus/$name.log
user=prometheus

depend() {
  need net localmount
  after firewall
}

start_pre() {
  ebegin "Preparing to start $name"
  eend $?
}

start() {
  ebegin "Starting $name"
  start-stop-daemon --start --quiet --background --name prometheus --make-pidfile --pidfile $pidfile --stderr $logfile --user $user --exec $command
  eend $?
}

stop() {
  ebegin "Stopping $name"
  start-stop-daemon --stop --quiet --pidfile $pidfile
  if [ -e $pidfile ]
    then rm $pidfile
  fi
  eend $?
}

restart() {
  ebegin "Restarting $name"
  start-stop-daemon --stop --quiet --pidfile $pidfile
  sleep 1
  start-stop-daemon --start --quiet --background --name prometheus --make-pidfile --pidfile $pidfile --stderr $logfile --user $user --exec $command
  eend $?
}
EOF

chown -R prometheus:prometheus /opt/prometheus
chown -R prometheus:prometheus /data/prometheus
chmod +x /etc/init.d/prometheus
rc-update add prometheus default

AlertManager 搭建

tar zxf alertmanager-0.27.0.linux-amd64.tar.gz
mv alertmanager-0.27.0.linux-amd64 /opt/alertmanager

cat > /opt/alertmanager/config.yml << EOF
route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1m
  receiver: 'dingtalk'

receivers:
  - name: 'web.hook'
    webhook_configs:
      - url: 'http://127.0.0.1:5001/'
  - name: 'dingtalk'
    webhook_configs:
      - url: http://localhost:8060/dingtalk/webhook_mention_users/send
        send_resolved: true

inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']
EOF

cat > /etc/init.d/alertmanager << EOF
#!/sbin/openrc-run

name="alertmanager"
description="Prometheus AlertManager"

command="/opt/alertmanager/alertmanager -- --config.file=/opt/alertmanager/config.yml --storage.path=/opt/alertmanager/data"
pidfile=/data/alertmanager/run.pid
logfile=/data/alertmanager/$name.log
user=alertmanager

depend() {
  need net localmount
  after firewall
}

start_pre() {
  ebegin "Preparing to start $name"
  eend $?
}

start() {
  ebegin "Starting $name"
  start-stop-daemon --start --quiet --background --name alertmanager --make-pidfile --pidfile $pidfile --stderr $logfile --user $user --exec $command
  eend $?
}

stop() {
  ebegin "Stopping $name"
  start-stop-daemon --stop --quiet --pidfile $pidfile
  if [ -e $pidfile ]
    then rm $pidfile
  fi
  eend $?
}

restart() {
  ebegin "Restarting $name"
  start-stop-daemon --stop --quiet --pidfile $pidfile
  sleep 1
  start-stop-daemon --start --quiet --background --name alertmanager --make-pidfile --pidfile $pidfile --stderr $logfile --user $user --exec $command
  eend $?
}
EOF

chown -R alertmanager:alertmanager /opt/alertmanager
chown -R alertmanager:alertmanager /data/alertmanager
chmod +x /etc/init.d/alertmanager
rc-update add alertmanager default

钉钉告警搭建

tar zxf prometheus-webhook-dingtalk-2.1.0.linux-amd64.tar.gz
mv prometheus-webhook-dingtalk-2.1.0.linux-amd64 /opt/dingtalk
mv /opt/dingtalk/prometheus-webhook-dingtalk /opt/dingtalk/dingtalk

cat > /opt/dingtalk/config.yml << EOF
timeout: 5s

templates:
  - /opt/dingtalk/template.tmpl

targets:
  webhook_robot:
    url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxx # 这里更换为自己的钉钉机器人api
EOF

cat > /opt/dingtalk/template.tmpl << EOF
{{ define "__subject" }}
[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}]
{{ end }}

{{ define "__alert_list" }}{{ range . }}
---
{{ if .Labels.owner }}@{{ .Labels.owner }}{{ end }}

**告警主题**: {{ .Annotations.summary }}

**告警类型**: {{ .Labels.alertname }}

**告警级别**: {{ .Labels.severity }} 

**告警主机**: {{ .Labels.instance }} 

**告警信息**: {{ index .Annotations "description" }}

**告警时间**: {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }}

{{ .Labels.Users }}
{{ end }}{{ end }}

{{ define "__resolved_list" }}{{ range . }}
---
{{ if .Labels.owner }}@{{ .Labels.owner }}{{ end }}

**告警主题**: {{ .Annotations.summary }}

**告警类型**: {{ .Labels.alertname }} 

**告警级别**: {{ .Labels.severity }}

**告警主机**: {{ .Labels.instance }}

**告警信息**: {{ index .Annotations "description" }}

**告警时间**: {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }}

**恢复时间**: {{ dateInZone "2006.01.02 15:04:05" (.EndsAt) "Asia/Shanghai" }}

{{ .Labels.Users }}
{{ end }}{{ end }}


{{ define "default.title" }}
{{ template "__subject" . }}
{{ end }}

{{ define "default.content" }}
{{ if gt (len .Alerts.Firing) 0 }}
<font color="#FF0000" size="8" face="黑体">**====侦测到{{ .Alerts.Firing | len  }}个故障====**</font>
{{ template "__alert_list" .Alerts.Firing }}
---
{{ end }}

{{ if gt (len .Alerts.Resolved) 0 }}
<font color="#00FF00" size="8" face="黑体">**====恢复{{ .Alerts.Resolved | len  }}个故障====**</font>
{{ template "__resolved_list" .Alerts.Resolved }}
{{ end }}
{{ end }}


{{ define "ding.link.title" }}{{ template "default.title" . }}{{ end }}
{{ define "ding.link.content" }}{{ template "default.content" . }}{{ end }}
{{ template "default.title" . }}
{{ template "default.content" . }}
EOF

cat > /etc/init.d/dingtalk << EOF
#!/sbin/openrc-run

name="dingtalk"
description="Prometheus Webhook DingTalk"

command="/opt/dingtalk/dingtalk -- --config.file=/opt/dingtalk/config.yml --web.listen-address=":8060" --web.enable-ui --web.enable-lifecycle --log.format=json"
pidfile=/data/dingtalk/run.pid
logfile=/data/dingtalk/$name.log
user=dingtalk

depend() {
  need net localmount
  after firewall
}

start_pre() {
  ebegin "Preparing to start $name"
  eend $?
}

start() {
  ebegin "Starting $name"
  start-stop-daemon --start --quiet --background --name dingtalk --make-pidfile --pidfile $pidfile --stderr $logfile --user $user --exec $command
  eend $?
}

stop() {
  ebegin "Stopping $name"
  start-stop-daemon --stop --quiet --pidfile $pidfile
  if [ -e $pidfile ]
    then rm $pidfile
  fi
  eend $?
}

restart() {
  ebegin "Restarting $name"
  start-stop-daemon --stop --quiet --pidfile $pidfile
  sleep 1
  start-stop-daemon --start --quiet --background --name dingtalk --make-pidfile --pidfile $pidfile --stderr $logfile --user $user --exec $command
  eend $?
}
EOF

chown -R dingtalk:dingtalk /opt/dingtalk
chown -R dingtalk:dingtalk /data/dingtalk
chmod +x /etc/init.d/dingtalk
rc-update add dingtalk default

Promtail 搭建

unzip promtail-linux-amd64.zip
mv promtail-linux-amd64 /opt/promtail/promtail

cat > /opt/promtail/config.yaml << EOF
server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /data/promtail/positions.yaml

clients:
  - url: http://localhost:3100/loki/api/v1/push

scrape_configs:
- job_name: EKS
  static_configs:
  - targets:
      - localhost
    labels:
      job: "System Logs|EKS"
      __path__: /var/log/messages
  - targets:
      - localhost
    labels:
      job: "Soft Logs|EKS"
      __path__: /data/*/*log
EOF

cat > /etc/init.d/promtail << EOF
#!/sbin/openrc-run

name="promtail"
description="Promtail"

command="/opt/promtail/promtail -- --config.file=/opt/promtail/config.yaml"
pidfile=/data/promtail/run.pid
logfile=/data/promtail/$name.log

depend() {
  need net localmount
  after firewall
}

start_pre() {
  ebegin "Preparing to start $name"
  eend $?
}

start() {
  ebegin "Starting $name"
  start-stop-daemon --start --quiet --background --name promtail --make-pidfile --pidfile $pidfile --stderr $logfile --exec $command
  eend $?
}

stop() {
  ebegin "Stopping $name"
  start-stop-daemon --stop --quiet --pidfile $pidfile
  if [ -e $pidfile ]
    then rm $pidfile
  fi
  eend $?
}

restart() {
  ebegin "Restarting $name"
  start-stop-daemon --stop --quiet --pidfile $pidfile
  sleep 1
  start-stop-daemon --start --quiet --background --name promtail --make-pidfile --pidfile $pidfile --stderr $logfile --exec $command
  eend $?
}
EOF

chmod +x /etc/init.d/promtail
rc-update add promtail default

启动各项服务

service grafana start
service loki start
service prometheus start
service alertmanager start
service dingtalk start
service promtail start

客户端搭建

Promtail 搭建

cd /tmp
wget -c https://github.com/grafana/loki/releases/download/v3.3.1/promtail-linux-amd64.zip
unzip promtail-linux-amd64.zip
mkdir -p /opt/promtail
mkdir -p /data/promtail
mv promtail-linux-amd64 /opt/promtail/promtail

cat /opt/promtail/config.yaml << EOF
server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /data/promtail/positions.yaml

clients:
  - url: http://192.168.10.1:3100/loki/api/v1/push

scrape_configs:
- job_name: MongoDB
  static_configs:
  - targets:
      - localhost
    labels:
      job: "System Logs|MongoDB"
      __path__: /var/log/*log
  - targets:
      - localhost
    labels:
      job: "MongoDB Logs|MongoDB"
      __path__: /var/log/mongodb/*log
EOF

cat > /etc/systemd/system/promtail.service << EOF
[Unit]
Description=Promtail
Documentation=https://github.com/grafana/loki
After=network.target

[Service]
Type=simple
ExecStart=/opt/promtail/promtail --config.file=/opt/promtail/config.yaml
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl enable promtail.service
systemctl start promtail.service

Node Exporter 搭建

cd /tmp
wget -c https://github.com/prometheus/node_exporter/releases/download/v1.8.2/node_exporter-1.8.2.linux-amd64.tar.gz
tar zxf node_exporter-1.8.2.linux-amd64.tar.gz

mkdir -p /data/exporter
mkdir -p /opt/exporter
mv node_exporter-1.8.2.linux-amd64 /opt/exporter
mv /opt/exporter/node_exporter /opt/exporter/exporter

useradd -r -M exporter
chown -R exporter:exporter /data/exporter
chown -R exporter:exporter /opt/exporter

cat > /etc/systemd/system/exporter.service << EOF
[Unit]
Description=Prometheus Node Exporter
After=network.target
 
[Service]
Restart=on-failure
ExecStart=/opt/exporter/exporter --web.listen-address=:9104
 
[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl enable exporter.service
systemctl start exporter.service

面板使用

打开 Grafana 面板并登陆:http://192.168.10.1:3000,默认账号密码为 admin/admin

01

在个人资料中修改语言为中文

02

03

找到左侧的数据源,并添加 Loki 及 Prometheus 两种数据源

Loki 的链接地址是 http://localhost:3100

04

Prometheus 的链接地址是 http://localhost:9090

05

找到仪表板,添加一个新的仪表板

06

07

点击 “导入仪表板” —— “Discard”

08

在输入框中输入仪表板的编号后点击加载,加载完成后选择自己的 Prometheus 数据源,这里我用的是 8919,如果想要更多仪表板,可以看这里:https://grafana.com/grafana/dashboards

09

可在面板中看到监控主机的详细情况

10

在左侧的 Logs 中可以查看日志信息,当然,还有一些诸如日志查询、搜索等功能,可以自己慢慢发现

11