K8S-错误:no metrics known for node


今天在部署完metrics-server后,查看pod日志发现一堆报错:

报错信息如下:

]# kubectl  logs -f -n kube-system            metrics-server-d8669575f-xl6mw
I1202 09:09:31.217954       1 serving.go:312] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key)
I1202 09:09:37.725863       1 secure_serving.go:116] Serving securely on [::]:443
E1202 09:09:49.807117       1 reststorage.go:135] unable to fetch node metrics for node "master": no metrics known for node
E1202 09:09:49.807185       1 reststorage.go:135] unable to fetch node metrics for node "node1": no metrics known for node
E1202 09:09:49.807202       1 reststorage.go:135] unable to fetch node metrics for node "node2": no metrics known for node
E1202 09:09:50.940606       1 reststorage.go:160] unable to fetch pod metrics for pod linux40/nginx-deployment-7d8599fbc9-68pf8: no metrics known for pod
E1202 09:09:53.825493       1 reststorage.go:135] unable to fetch node metrics for node "node1": no metrics known for node
E1202 09:09:53.825540       1 reststorage.go:135] unable to fetch node metrics for node "node2": no metrics known for node
E1202 09:09:53.825551       1 reststorage.go:135] unable to fetch node metrics for node "master": no metrics known for node
E1202 09:10:05.976306       1 reststorage.go:160] unable to fetch pod metrics for pod linux40/nginx-deployment-7d8599fbc9-68pf8: no metrics known for pod
E1202 09:10:21.291923       1 reststorage.go:160] unable to fetch pod metrics for pod linux40/nginx-deployment-7d8599fbc9-68pf8: no metrics known for pod
E1202 09:10:31.601208       1 reststorage.go:135] unable to fetch node metrics for node "master": no metrics known for node
E1202 09:10:31.601330       1 reststorage.go:135] unable to fetch node metrics for node "node1": no metrics known for node
E1202 09:10:31.601353       1 reststorage.go:135] unable to fetch node metrics for node "node2": no metrics known for node
E1202 09:10:31.610963       1 reststorage.go:160] unable to fetch pod metrics for pod kube-system/kube-flannel-ds-64qdh: no metrics known for pod
E1202 09:10:31.611032       1 reststorage.go:160] unable to fetch pod metrics for pod linux40/magedu-tomcat-app1-deployment-6cd664c5bd-wprjb: no metrics known for pod

查看pod详情未发现有效的报错信息

]# kubectl  describe pod metrics-server-6c97c89fd5-j2rql -n kube-system
Name:                 metrics-server-6c97c89fd5-j2rql
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 node2/192.168.64.112
Start Time:           Thu, 02 Dec 2021 16:50:50 +0800
Labels:               k8s-app=metrics-server
                      pod-template-hash=6c97c89fd5
Annotations:          
Status:               Running
IP:                   10.244.2.61
IPs:
  IP:           10.244.2.61
Controlled By:  ReplicaSet/metrics-server-6c97c89fd5
Containers:
  metrics-server:
    Container ID:  docker://eac4a2db02ca75315047eb778b7d3e1d7543d10ed6d33b4b1eddb006f824e34e
    Image:         mirrorgooglecontainers/metrics-server-amd64:v0.3.6
    Image ID:      docker://sha256:9dd718864ce61b4c0805eaf75f87b95302960e65d4857cb8b6591864394be55b
    Port:          4443/TCP
    Host Port:     0/TCP
    Args:
      --cert-dir=/tmp
      --secure-port=4443
      --kubelet-preferred-address-types=InternalIP
      --kubelet-use-node-status-port
      --kubelet-insecure-tls
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Thu, 02 Dec 2021 16:51:10 +0800
      Finished:     Thu, 02 Dec 2021 16:51:11 +0800
    Ready:          False
    Restart Count:  2
    Liveness:       http-get https://:https/livez delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:      http-get https://:https/readyz delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:    
    Mounts:
      /tmp from tmp-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from metrics-server-token-4xrbc (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  tmp-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  
  metrics-server-token-4xrbc:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  metrics-server-token-4xrbc
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled            default-scheduler  Successfully assigned kube-system/metrics-server-6c97c89fd5-j2rql to node2
  Normal   Pulled     16s (x3 over 34s)  kubelet, node2     Container image "mirrorgooglecontainers/metrics-server-amd64:v0.3.6" already present on machine
  Normal   Created    16s (x3 over 34s)  kubelet, node2     Created container metrics-server
  Normal   Started    16s (x3 over 34s)  kubelet, node2     Started container metrics-server
  Warning  BackOff    8s (x5 over 32s)   kubelet, node2     Back-off restarting failed container
因为部署集群的时候,CA 证书并没有把各个节点的 IP 签上去,所以这里 metrics-server 通过 IP 去请求时,提示签的证书没有对应的 IP(错误:x509: cannot validate certificate for 192.168.33.11 because it doesn’t contain any IP SANs), 我们可以添加一个--kubelet-insecure-tls参数跳过证书校验:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: metrics-server
  namespace: kube-system
  labels:
    k8s-app: metrics-server
spec:
  selector:
    matchLabels:
      k8s-app: metrics-server
  template:
    metadata:
      name: metrics-server
      labels:
        k8s-app: metrics-server
    spec:
      serviceAccountName: metrics-server
      volumes:
      - name: tmp-dir
        emptyDir: {}
      containers:
      - name: metrics-server
        image: mirrorgooglecontainers/metrics-server-amd64:v0.3.6
        imagePullPolicy: IfNotPresent
        command:
        - /metrics-server
        - --kubelet-insecure-tls  //跳过tls
        - --kubelet-preferred-address-types=InternalIP  //采用内部IP通信
        volumeMounts:
        - name: tmp-dir
          mountPath: /tmp
        resources:
          limits:
            cpu: 300m
            memory: 200Mi
          requests:
            cpu: 200m
            memory: 100Mi
k8s