2024-02-09 cf23396f181636d01dda4f0ddb807bc5 99+ 20 分钟 3.0 k0次访问

K8s集群故障(The connection to the server <host>:<port> was refused - did you specify the right host or port)解决

不必太纠结于当下，也不必太忧虑未来，当你经历过一些事情的时候，眼前的风景已经和从前不一样了。——村上春树

写在前面

过年回家整理集群相关的笔记，发现集群不能用了.
简单记录解决办法,其实就是证书过期了，但是提示和之前的不一样。
理解不足小伙伴帮忙指正

不必太纠结于当下，也不必太忧虑未来，当你经历过一些事情的时候，眼前的风景已经和从前不一样了。——村上春树

遇到了什么问题？

本地通过虚机部署一个高可用 k8s 集群，好久没用了，开机命令无法正常执行，提示 vip 对应的 IP 访问 apiservice 对应的端口无法访问成功

1
2
3

┌──[root@vms100.liruilongs.github.io]-[~]
└─$kubectl  get nodes
The connection to the server 192.168.26.99:30033 was refused - did you specify the right host or port?

如何排故的？

测试端口确实不通，说明传输层就不通了

┌──[root@vms100.liruilongs.github.io]-[~]
└─$</dev/tcp/192.168.26.99/30033
-bash: connect: 拒绝连接
-bash: /dev/tcp/192.168.26.99/30033: 拒绝连接

通过 ip a 命令查看配置的 VIP 是否生效，发现没有生效。说明当前节点配置 VIP 的 keepalived 有问题

┌──[root@vms100.liruilongs.github.io]-[~]
└─$ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:0e:5d:5f brd ff:ff:ff:ff:ff:ff
    inet 192.168.26.100/24 brd 192.168.26.255 scope global ens32
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:fe0e:5d5f/64 scope link
       valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
    link/ether 02:42:68:f8:90:26 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever

测试网络层，ping 测发现可以通，好奇怪，说明其他的 VIP 节点有可用的

┌──[root@vms100.liruilongs.github.io]-[~]
└─$ping 192.168.26.99
PING 192.168.26.99 (192.168.26.99) 56(84) bytes of data.
64 bytes from 192.168.26.99: icmp_seq=1 ttl=64 time=0.784 ms
64 bytes from 192.168.26.99: icmp_seq=2 ttl=64 time=0.411 ms
^C
--- 192.168.26.99 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1012ms
rtt min/avg/max/mdev = 0.411/0.597/0.784/0.188 ms

SSH 进去看一下

┌──[root@vms100.liruilongs.github.io]-[~]
└─$ssh root@192.168.26.99
The authenticity of host '192.168.26.99 (192.168.26.99)' can't be established.
ECDSA key fingerprint is SHA256:BmaDR4pX6G1WgStkR7Lcl7Yg4fhP2d8idUBxW3HEzsA.
ECDSA key fingerprint is MD5:2e:49:16:97:30:90:e3:28:b2:43:2d:64:9d:f2:d4:6d.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.26.99' (ECDSA) to the list of known hosts.
Last login: Wed Nov 15 11:12:11 2023 from 192.168.26.100

另一个 k8s manster 节点，查看 IP，这个节点的 VIP 正常

┌──[root@vms102.liruilongs.github.io]-[~]
└─$ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:eb:fa:00 brd ff:ff:ff:ff:ff:ff
    inet 192.168.26.102/24 brd 192.168.26.255 scope global ens32
       valid_lft forever preferred_lft forever
    inet 192.168.26.99/32 scope global ens32
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:feeb:fa00/64 scope link
       valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
    link/ether 02:42:ed:cf:c0:d1 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever

做网络端口测试，发现通的

1 2	┌──[root@vms102.liruilongs.github.io]-[~] └─$</dev/tcp/192.168.26.99/30033

运行 kubectl 客户端命令，确认一下

1
2
3

┌──[root@vms102.liruilongs.github.io]-[~]
└─$kubectl  get nodes
Unable to connect to the server: EOF

连接异常，这里我们打印一下接口调用详细信息

┌──[root@vms102.liruilongs.github.io]-[~]
└─$kubectl  get nodes -vv
error: invalid argument "v" for "-v, --v" flag: strconv.ParseInt: parsing "v": invalid syntax
See 'kubectl get --help' for usage.

高版本的命令有变化，需要注意一下。

┌──[root@vms102.liruilongs.github.io]-[~]
└─$kubectl  get nodes -v=1
I0209 13:52:56.780335   72398 shortcut.go:100] Error loading discovery information: Get "https://192.168.26.99:30033/api?timeout=32s": dial tcp 192.168.26.99:30033: connect: connection refused
The connection to the server 192.168.26.99:30033 was refused - did you specify the right host or port?
┌──[root@vms102.liruilongs.github.io]-[~]
└─$kubectl  get nodes -v=2
I0209 13:53:16.963102   72533 shortcut.go:100] Error loading discovery information: Get "https://192.168.26.99:30033/api?timeout=32s": dial tcp 192.168.26.99:30033: connect: connection refused
The connection to the server 192.168.26.99:30033 was refused - did you specify the right host or port?

报和之前同样的错，说明所有的节点都有问题，不是特点的某个节点问题，通过容器管理工具查看一下高可用组件是否正常

┌──[root@vms102.liruilongs.github.io]-[~]
└─$docker ps | grep keep
f2a9b9f187a6   0cde578847cc                                        "/container/tool/run"    12 hours ago     Up 12 hours               k8s_keepalived_keepalived-vms102.liruilongs.github.io_kube-system_f0ae51f10833bbd4d70ccb8690f2429c_55
822eec55d6af   registry.aliyuncs.com/google_containers/pause:3.8   "/pause"                 12 hours ago     Up 12 hours               k8s_POD_keepalived-vms102.liruilongs.github.io_kube-system_f0ae51f10833bbd4d70ccb8690f2429c_55

查看 apiserver 是否正常,执行命令实际上是调用的 kube-apiserver

1
2
3

┌──[root@vms102.liruilongs.github.io]-[~]
└─$docker ps |grep api
56807ccad104   registry.aliyuncs.com/google_containers/pause:3.8   "/pause"                 12 hours ago         Up 12 hours                   k8s_POD_kube-apiserver-vms102.liruilongs.github.io_kube-system_88f80934116e8f989883c8eba6636201_41

果然挂掉了，这里看下最后的日志

┌──[root@vms102.liruilongs.github.io]-[~]
└─$docker ps -a | grep api
c9bd413b176f   b09a3dc327be                                        "kube-apiserver --ad…"   2 minutes ago        Exited (1) 2 minutes ago             k8s_kube-apiserver_kube-apiserver-vms102.liruilongs.github.io_kube-system_88f80934116e8f989883c8eba6636201_225
56807ccad104   registry.aliyuncs.com/google_containers/pause:3.8   "/pause"                 12 hours ago         Up 12 hours                          k8s_POD_kube-apiserver-vms102.liruilongs.github.io_kube-system_88f80934116e8f989883c8eba6636201_41

日志显示加载准入控制器之后直接报错了，没有其他的提示。

┌──[root@vms102.liruilongs.github.io]-[~]
└─$docker logs --tail -5 c9bd413b176f
I0209 05:51:43.041043       1 server.go:563] external host was not specified, using 192.168.26.102
I0209 05:51:43.042642       1 server.go:161] Version: v1.25.1
I0209 05:51:43.042693       1 server.go:163] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
I0209 05:51:43.362808       1 shared_informer.go:255] Waiting for caches to sync for node_authorizer
I0209 05:51:43.363544       1 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
I0209 05:51:43.363560       1 plugins.go:161] Loaded 11 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
I0209 05:51:43.364480       1 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
I0209 05:51:43.364499       1 plugins.go:161] Loaded 11 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
E0209 05:52:03.366417       1 run.go:74] "command failed" err="context deadline exceeded"

kube-apiserver 需要和 etcd 不断的交互获取集群信息，更新集群信息，所以看一下 etcd

┌──[root@vms102.liruilongs.github.io]-[~]
└─$docker ps | grep etcd
43dccee957e0   a8a176a5d5d6                                        "etcd --advertise-cl…"   About a minute ago   Up About a minute             k8s_etcd_etcd-vms102.liruilongs.github.io_kube-system_bb9615ff1be73c1b0c1f420f3da9806a_156
523a83b11288   registry.aliyuncs.com/google_containers/pause:3.8   "/pause"                 12 hours ago         Up 12 hours                   k8s_POD_etcd-vms102.liruilongs.github.io_kube-system_bb9615ff1be73c1b0c1f420f3da9806a_41

通过 etcd 的日志可以看到，证书相关警告，很大原因是证书过期了

┌──[root@vms102.liruilongs.github.io]-[~]
└─$docker logs 43dccee957e0 | tail -5
...................
{"level":"warn","ts":"2024-02-09T05:59:17.452Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.26.101:51158","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-02-09T05:59:17.452Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.26.101:51148","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-02-09T05:59:17.553Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.26.101:51166","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-02-09T05:59:17.553Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.26.101:51164","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-02-09T05:59:17.588Z","caller":"etcdhttp/metrics.go:173","msg":"serving /health false; no leader"}
{"level":"warn","ts":"2024-02-09T05:59:17.588Z","caller":"etcdhttp/metrics.go:86","msg":"/health error","output":"{\"health\":\"false\",\"reason\":\"RAFT NO LEADER\"}","status-code":503}

检查证书，发现确实过期了，1 月 26 到期，现在 2 月 8 号

┌──[root@vms102.liruilongs.github.io]-[~]
└─$openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -text | grep Not
            Not Before: Jan 26 11:27:49 2023 GMT
            Not After : Jan 26 11:30:26 2024 GMT

通过 kubeadm 工具再次检查

┌──[root@vms102.liruilongs.github.io]-[~]
└─$kubeadm certs check-expiration
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[check-expiration] Error reading configuration from the Cluster. Falling back to default configuration

CERTIFICATE                EXPIRES                  RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
admin.conf                 Jan 26, 2024 11:30 UTC   <invalid>       ca                      no
apiserver                  Jan 26, 2024 11:30 UTC   <invalid>       ca                      no
apiserver-etcd-client      Jan 26, 2024 11:30 UTC   <invalid>       etcd-ca                 no
apiserver-kubelet-client   Jan 26, 2024 11:30 UTC   <invalid>       ca                      no
controller-manager.conf    Jan 26, 2024 11:30 UTC   <invalid>       ca                      no
etcd-healthcheck-client    Jan 26, 2024 11:30 UTC   <invalid>       etcd-ca                 no
etcd-peer                  Jan 26, 2024 11:30 UTC   <invalid>       etcd-ca                 no
etcd-server                Jan 26, 2024 11:30 UTC   <invalid>       etcd-ca                 no
front-proxy-client         Jan 26, 2024 11:30 UTC   <invalid>       front-proxy-ca          no
scheduler.conf             Jan 26, 2024 11:30 UTC   <invalid>       ca                      no

CERTIFICATE AUTHORITY   EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
ca                      Jan 23, 2033 11:27 UTC   8y              no
etcd-ca                 Jan 23, 2033 11:27 UTC   8y              no
front-proxy-ca          Jan 23, 2033 11:27 UTC   8y              no
┌──[root@vms102.liruilongs.github.io]-[~]
└─$

如何解决的？

问题确定，解决就比较简单了，直接更新证书即可，需要注意当前集群为高可用，3 master 节点，所以所有的 master 节点需要更新。

先更新一个节点的

备份一下

1 2	┌──[root@vms102.liruilongs.github.io]-[~] └─$cp -r /etc/kubernetes /etc/kubernetes.20240209.bak

kubeadm certs renew all 命令用于批量的证书续约

┌──[root@vms102.liruilongs.github.io]-[~]
└─$kubeadm certs renew all
[renew] Reading configuration from the cluster...
[renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[renew] Error reading configuration from the Cluster. Falling back to default configuration

certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed
certificate for serving the Kubernetes API renewed
certificate the apiserver uses to access etcd renewed
certificate for the API server to connect to kubelet renewed
certificate embedded in the kubeconfig file for the controller manager to use renewed
certificate for liveness probes to healthcheck etcd renewed
certificate for etcd nodes to communicate with each other renewed
certificate for serving etcd renewed
certificate for the front proxy client renewed
certificate embedded in the kubeconfig file for the scheduler manager to use renewed

Done renewing certificates. You must restart the kube-apiserver, kube-controller-manager, kube-scheduler and etcd, so that they can use the new certificates.

检查续约是否成功

┌──[root@vms102.liruilongs.github.io]-[~]
└─$kubeadm certs check-expiration
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[check-expiration] Error reading configuration from the Cluster. Falling back to default configuration

CERTIFICATE                EXPIRES                  RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
admin.conf                 Feb 08, 2025 06:18 UTC   364d            ca                      no
apiserver                  Feb 08, 2025 06:18 UTC   364d            ca                      no
apiserver-etcd-client      Feb 08, 2025 06:18 UTC   364d            etcd-ca                 no
apiserver-kubelet-client   Feb 08, 2025 06:18 UTC   364d            ca                      no
controller-manager.conf    Feb 08, 2025 06:18 UTC   364d            ca                      no
etcd-healthcheck-client    Feb 08, 2025 06:18 UTC   364d            etcd-ca                 no
etcd-peer                  Feb 08, 2025 06:18 UTC   364d            etcd-ca                 no
etcd-server                Feb 08, 2025 06:18 UTC   364d            etcd-ca                 no
front-proxy-client         Feb 08, 2025 06:18 UTC   364d            front-proxy-ca          no
scheduler.conf             Feb 08, 2025 06:18 UTC   364d            ca                      no

CERTIFICATE AUTHORITY   EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
ca                      Jan 23, 2033 11:27 UTC   8y              no
etcd-ca                 Jan 23, 2033 11:27 UTC   8y              no
front-proxy-ca          Jan 23, 2033 11:27 UTC   8y              no
┌──[root@vms102.liruilongs.github.io]-[~]
└─$

没问题之后，通过 ansible 批量操作

下面为清单文件

┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$cat host.yaml
ansible:
  children:
    ansible_master:
      hosts:
        192.168.26.100:
    ansible_node:
      hosts:
        192.168.26.[101:103]:
        192.168.26.[105:106]:
k8s:
  children:
    k8s_master:
      hosts:
        192.168.26.[100:102]:
    k8s_node:
      hosts:
        192.168.26.103:
        192.168.26.[105:106]:

所有的 master 节点批量续约，需要把之前的操作的节点排除掉

┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$ansible k8s_master  -m shell -a "kubeadm certs renew all" -i host.yaml  --limit !192.168.26.102
ansible k8s_master  -m shell -a "kubeadm certs renew all" -i host.yaml  --limit kubectl get all -A  -o wide | grep tidb-cluster  | awk '{print $2}' | awk -F'/' '{ print "kubectl delete "$1" "$2 " -n tidb-cluster --force" }' | xargs  -n1 -I{} bash -c "{}".168.26.102
usage: ansible [-h] [--version] [-v] [-b] [--become-method BECOME_METHOD]
               [--become-user BECOME_USER] [-K] [-i INVENTORY] [--list-hosts]
               [-l SUBSET] [-P POLL_INTERVAL] [-B SECONDS] [-o] [-t TREE] [-k]
               [--private-key PRIVATE_KEY_FILE] [-u REMOTE_USER]
               [-c CONNECTION] [-T TIMEOUT]
               [--ssh-common-args SSH_COMMON_ARGS]
               [--sftp-extra-args SFTP_EXTRA_ARGS]
               [--scp-extra-args SCP_EXTRA_ARGS]
               [--ssh-extra-args SSH_EXTRA_ARGS] [-C] [--syntax-check] [-D]
               [-e EXTRA_VARS] [--vault-id VAULT_IDS]
               [--ask-vault-pass | --vault-password-file VAULT_PASSWORD_FILES]
               [-f FORKS] [-M MODULE_PATH] [--playbook-dir BASEDIR]
               [-a MODULE_ARGS] [-m MODULE_NAME]
               pattern
ansible: error: unrecognized arguments: get all -A wide
┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$ansible k8s_master  -m shell -a "kubeadm certs renew all" -i host.yaml  --limit "!192.168.26.102"
ansible k8s_master  -m shell -a "kubeadm certs renew all" -i host.yaml  --limit "kubectl get all -A  -o wide | grep tidb-cluster  | awk '{print $2}' | awk -F'/' '{ print "kubectl delete "$1" "$2 " -n tidb-cluster --force" }' | xargs  -n1 -I{} bash -c "{}".168.26.102"
usage: ansible [-h] [--version] [-v] [-b] [--become-method BECOME_METHOD]
               [--become-user BECOME_USER] [-K] [-i INVENTORY] [--list-hosts]
               [-l SUBSET] [-P POLL_INTERVAL] [-B SECONDS] [-o] [-t TREE] [-k]
               [--private-key PRIVATE_KEY_FILE] [-u REMOTE_USER]
               [-c CONNECTION] [-T TIMEOUT]
               [--ssh-common-args SSH_COMMON_ARGS]
               [--sftp-extra-args SFTP_EXTRA_ARGS]
               [--scp-extra-args SCP_EXTRA_ARGS]
               [--ssh-extra-args SSH_EXTRA_ARGS] [-C] [--syntax-check] [-D]
               [-e EXTRA_VARS] [--vault-id VAULT_IDS]
               [--ask-vault-pass | --vault-password-file VAULT_PASSWORD_FILES]
               [-f FORKS] [-M MODULE_PATH] [--playbook-dir BASEDIR]
               [-a MODULE_ARGS] [-m MODULE_NAME]
               pattern
ansible: error: unrecognized arguments: delete    -n tidb-cluster --force }' | xargs  -n1 -I{} bash -c {}.168.26.102

报错了，!192.168.26.102 是一个特殊命令，所以我们添加引号试试，添加单引号可以正常运行，其他节点续约完成

┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$ansible k8s_master  -m shell -a "kubeadm certs renew all" -i host.yaml  --limit '!192.168.26.102'
192.168.26.101 | CHANGED | rc=0 >>
[renew] Reading configuration from the cluster...
[renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[renew] Error reading configuration from the Cluster. Falling back to default configuration

certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed
certificate for serving the Kubernetes API renewed
certificate the apiserver uses to access etcd renewed
certificate for the API server to connect to kubelet renewed
certificate embedded in the kubeconfig file for the controller manager to use renewed
certificate for liveness probes to healthcheck etcd renewed
certificate for etcd nodes to communicate with each other renewed
certificate for serving etcd renewed
certificate for the front proxy client renewed
certificate embedded in the kubeconfig file for the scheduler manager to use renewed

Done renewing certificates. You must restart the kube-apiserver, kube-controller-manager, kube-scheduler and etcd, so that they can use the new certificates.
192.168.26.100 | CHANGED | rc=0 >>
[renew] Reading configuration from the cluster...
[renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[renew] Error reading configuration from the Cluster. Falling back to default configuration

certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed
certificate for serving the Kubernetes API renewed
certificate the apiserver uses to access etcd renewed
certificate for the API server to connect to kubelet renewed
certificate embedded in the kubeconfig file for the controller manager to use renewed
certificate for liveness probes to healthcheck etcd renewed
certificate for etcd nodes to communicate with each other renewed
certificate for serving etcd renewed
certificate for the front proxy client renewed
certificate embedded in the kubeconfig file for the scheduler manager to use renewed

Done renewing certificates. You must restart the kube-apiserver, kube-controller-manager, kube-scheduler and etcd, so that they can use the new certificates.
┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$

这里我们重启 docker ，正常重启静态pod 就可以，如果当前为生产集群，考虑晚上重启容器运行时，或者移动静态Pod 对应的 yaml 文件，默认kubelet 会每个一段时间重新扫描对应的目录的yaml 文件

重启 docker 注意这里的 --forks 1，序列化运行，每次一个节点运行

┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$ansible k8s_master  -m shell -a "systemctl restart docker" -i host.yaml  --forks 1
192.168.26.100 | CHANGED | rc=0 >>

192.168.26.101 | CHANGED | rc=0 >>

192.168.26.102 | CHANGED | rc=0 >>

运行 kubectl 命令测试

┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$ansible k8s_master  -m shell -a "kubectl get nodes --kubeconfig /etc/kubernetes/admin.conf" -i host.yaml
192.168.26.100 | CHANGED | rc=0 >>
NAME                          STATUS   ROLES           AGE    VERSION
vms100.liruilongs.github.io   Ready    control-plane   378d   v1.25.1
vms101.liruilongs.github.io   Ready    control-plane   378d   v1.25.1
vms102.liruilongs.github.io   Ready    control-plane   378d   v1.25.1
vms103.liruilongs.github.io   Ready    <none>          378d   v1.25.1
vms105.liruilongs.github.io   Ready    <none>          378d   v1.25.1
vms106.liruilongs.github.io   Ready    <none>          378d   v1.25.1
192.168.26.102 | CHANGED | rc=0 >>
NAME                          STATUS   ROLES           AGE    VERSION
vms100.liruilongs.github.io   Ready    control-plane   378d   v1.25.1
vms101.liruilongs.github.io   Ready    control-plane   378d   v1.25.1
vms102.liruilongs.github.io   Ready    control-plane   378d   v1.25.1
vms103.liruilongs.github.io   Ready    <none>          378d   v1.25.1
vms105.liruilongs.github.io   Ready    <none>          378d   v1.25.1
vms106.liruilongs.github.io   Ready    <none>          378d   v1.25.1
192.168.26.101 | CHANGED | rc=0 >>
NAME                          STATUS   ROLES           AGE    VERSION
vms100.liruilongs.github.io   Ready    control-plane   378d   v1.25.1
vms101.liruilongs.github.io   Ready    control-plane   378d   v1.25.1
vms102.liruilongs.github.io   Ready    control-plane   378d   v1.25.1
vms103.liruilongs.github.io   Ready    <none>          378d   v1.25.1
vms105.liruilongs.github.io   Ready    <none>          378d   v1.25.1
vms106.liruilongs.github.io   Ready    <none>          378d   v1.25.1
┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$

确实没问题后，拷贝证书到默认加载位置，或者配置环境变量

┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$ansible k8s_master  -m copy -a "src=/etc/kubernetes/admin.conf dest=/root/.kube/config" -i host.yaml
192.168.26.101 | CHANGED => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python"
    },
    "changed": true,
    "checksum": "c58460352ef70350a39a4fc6b01645ed68cf56dc",
    "dest": "/root/.kube/config",
    "gid": 0,
    "group": "root",
    "md5sum": "470ad5691e98e2dd5682186c64cc5d33",
    "mode": "0600",
    "owner": "root",
    "size": 5674,
    "src": "/root/.ansible/tmp/ansible-tmp-1707464341.43-44557-35016830998762/source",
    "state": "file",
    "uid": 0
}
192.168.26.100 | CHANGED => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python"
    },
    "changed": true,
    "checksum": "c58460352ef70350a39a4fc6b01645ed68cf56dc",
    "dest": "/root/.kube/config",
    "gid": 0,
    "group": "root",
    "md5sum": "470ad5691e98e2dd5682186c64cc5d33",
    "mode": "0600",
    "owner": "root",
    "size": 5674,
    "src": "/root/.ansible/tmp/ansible-tmp-1707464341.41-44555-140261297562614/source",
    "state": "file",
    "uid": 0
}
192.168.26.102 | CHANGED => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python"
    },
    "changed": true,
    "checksum": "c58460352ef70350a39a4fc6b01645ed68cf56dc",
    "dest": "/root/.kube/config",
    "gid": 0,
    "group": "root",
    "md5sum": "470ad5691e98e2dd5682186c64cc5d33",
    "mode": "0600",
    "owner": "root",
    "size": 5674,
    "src": "/root/.ansible/tmp/ansible-tmp-1707464341.39-44559-184122506106441/source",
    "state": "file",
    "uid": 0
}

在次测试，集群恢复正常

┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$ansible k8s_master  -m shell -a "kubectl get nodes " -i host.yaml
192.168.26.101 | CHANGED | rc=0 >>
NAME                          STATUS   ROLES           AGE    VERSION
vms100.liruilongs.github.io   Ready    control-plane   378d   v1.25.1
vms101.liruilongs.github.io   Ready    control-plane   378d   v1.25.1
vms102.liruilongs.github.io   Ready    control-plane   378d   v1.25.1
vms103.liruilongs.github.io   Ready    <none>          378d   v1.25.1
vms105.liruilongs.github.io   Ready    <none>          378d   v1.25.1
vms106.liruilongs.github.io   Ready    <none>          378d   v1.25.1
192.168.26.100 | CHANGED | rc=0 >>
NAME                          STATUS   ROLES           AGE    VERSION
vms100.liruilongs.github.io   Ready    control-plane   378d   v1.25.1
vms101.liruilongs.github.io   Ready    control-plane   378d   v1.25.1
vms102.liruilongs.github.io   Ready    control-plane   378d   v1.25.1
vms103.liruilongs.github.io   Ready    <none>          378d   v1.25.1
vms105.liruilongs.github.io   Ready    <none>          378d   v1.25.1
vms106.liruilongs.github.io   Ready    <none>          378d   v1.25.1
192.168.26.102 | CHANGED | rc=0 >>
NAME                          STATUS   ROLES           AGE    VERSION
vms100.liruilongs.github.io   Ready    control-plane   378d   v1.25.1
vms101.liruilongs.github.io   Ready    control-plane   378d   v1.25.1
vms102.liruilongs.github.io   Ready    control-plane   378d   v1.25.1
vms103.liruilongs.github.io   Ready    <none>          378d   v1.25.1
vms105.liruilongs.github.io   Ready    <none>          378d   v1.25.1
vms106.liruilongs.github.io   Ready    <none>          378d   v1.25.1
┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$

博文部分内容参考

https://blog.csdn.net/sanhewuyang/article/details/128436670

K8s集群故障(The connection to the server <host>:<port> was refused - did you specify the right host or port)解决

https://liruilongs.github.io/2024/02/09/K8s/环境部署-运维/K8s集群故障(The connection to the server was refused - did you specify the right host or port)解决/

作者

山河已无恙

发布于

2024-02-09

更新于

2024-02-15

K8s集群故障(The connection to the server <host>:<port> was refused - did you specify the right host or port)解决

写在前面

遇到了什么问题？

如何排故的？

如何解决的？

博文部分内容参考

作者

发布于

更新于

许可协议

喜欢这篇文章？打赏一下作者吧

目录

链接

最新评论

最新文章

分类

归档

标签

订阅更新

Your browser is out-of-date!