K8s集群故障(The connection to the server <host>:<port> was refused - did you specify the right host or port)解决

不必太纠结于当下,也不必太忧虑未来,当你经历过一些事情的时候,眼前的风景已经和从前不一样了。——村上春树

写在前面


  • 过年回家整理集群相关的笔记,发现集群不能用了.
  • 简单记录解决办法,其实就是证书过期了,但是提示和之前的不一样。
  • 理解不足小伙伴帮忙指正

不必太纠结于当下,也不必太忧虑未来,当你经历过一些事情的时候,眼前的风景已经和从前不一样了。——村上春树


遇到了什么问题?

本地通过虚机部署一个高可用 k8s 集群,好久没用了,开机命令无法正常执行,提示 vip 对应的 IP 访问 apiservice 对应的端口无法访问成功

1
2
3
┌──[root@vms100.liruilongs.github.io]-[~]
└─$kubectl get nodes
The connection to the server 192.168.26.99:30033 was refused - did you specify the right host or port?

如何排故的?

测试端口确实不通,说明传输层就不通了

1
2
3
4
┌──[root@vms100.liruilongs.github.io]-[~]
└─$</dev/tcp/192.168.26.99/30033
-bash: connect: 拒绝连接
-bash: /dev/tcp/192.168.26.99/30033: 拒绝连接

通过 ip a 命令查看配置的 VIP 是否生效,发现没有生效。说明 当前节点配置 VIP 的 keepalived 有问题

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
┌──[root@vms100.liruilongs.github.io]-[~]
└─$ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:0e:5d:5f brd ff:ff:ff:ff:ff:ff
inet 192.168.26.100/24 brd 192.168.26.255 scope global ens32
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fe0e:5d5f/64 scope link
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
link/ether 02:42:68:f8:90:26 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever

测试网络层,ping 测发现可以通,好奇怪,说明其他的 VIP 节点有可用的

1
2
3
4
5
6
7
8
9
┌──[root@vms100.liruilongs.github.io]-[~]
└─$ping 192.168.26.99
PING 192.168.26.99 (192.168.26.99) 56(84) bytes of data.
64 bytes from 192.168.26.99: icmp_seq=1 ttl=64 time=0.784 ms
64 bytes from 192.168.26.99: icmp_seq=2 ttl=64 time=0.411 ms
^C
--- 192.168.26.99 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1012ms
rtt min/avg/max/mdev = 0.411/0.597/0.784/0.188 ms

SSH 进去看一下

1
2
3
4
5
6
7
8
┌──[root@vms100.liruilongs.github.io]-[~]
└─$ssh root@192.168.26.99
The authenticity of host '192.168.26.99 (192.168.26.99)' can't be established.
ECDSA key fingerprint is SHA256:BmaDR4pX6G1WgStkR7Lcl7Yg4fhP2d8idUBxW3HEzsA.
ECDSA key fingerprint is MD5:2e:49:16:97:30:90:e3:28:b2:43:2d:64:9d:f2:d4:6d.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.26.99' (ECDSA) to the list of known hosts.
Last login: Wed Nov 15 11:12:11 2023 from 192.168.26.100

另一个 k8s manster 节点,查看 IP,这个节点的 VIP 正常

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
┌──[root@vms102.liruilongs.github.io]-[~]
└─$ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:eb:fa:00 brd ff:ff:ff:ff:ff:ff
inet 192.168.26.102/24 brd 192.168.26.255 scope global ens32
valid_lft forever preferred_lft forever
inet 192.168.26.99/32 scope global ens32
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:feeb:fa00/64 scope link
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
link/ether 02:42:ed:cf:c0:d1 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever

做网络端口测试,发现通的

1
2
┌──[root@vms102.liruilongs.github.io]-[~]
└─$</dev/tcp/192.168.26.99/30033

运行 kubectl 客户端命令,确认一下

1
2
3
┌──[root@vms102.liruilongs.github.io]-[~]
└─$kubectl get nodes
Unable to connect to the server: EOF

连接异常,这里我们打印一下接口调用详细信息

1
2
3
4
┌──[root@vms102.liruilongs.github.io]-[~]
└─$kubectl get nodes -vv
error: invalid argument "v" for "-v, --v" flag: strconv.ParseInt: parsing "v": invalid syntax
See 'kubectl get --help' for usage.

高版本的命令有变化,需要注意一下。

1
2
3
4
5
6
7
8
┌──[root@vms102.liruilongs.github.io]-[~]
└─$kubectl get nodes -v=1
I0209 13:52:56.780335 72398 shortcut.go:100] Error loading discovery information: Get "https://192.168.26.99:30033/api?timeout=32s": dial tcp 192.168.26.99:30033: connect: connection refused
The connection to the server 192.168.26.99:30033 was refused - did you specify the right host or port?
┌──[root@vms102.liruilongs.github.io]-[~]
└─$kubectl get nodes -v=2
I0209 13:53:16.963102 72533 shortcut.go:100] Error loading discovery information: Get "https://192.168.26.99:30033/api?timeout=32s": dial tcp 192.168.26.99:30033: connect: connection refused
The connection to the server 192.168.26.99:30033 was refused - did you specify the right host or port?

报和之前同样的错,说明所有的节点都有问题,不是特点的某个节点问题,通过容器管理工具查看一下 高可用组件是否正常

1
2
3
4
┌──[root@vms102.liruilongs.github.io]-[~]
└─$docker ps | grep keep
f2a9b9f187a6 0cde578847cc "/container/tool/run" 12 hours ago Up 12 hours k8s_keepalived_keepalived-vms102.liruilongs.github.io_kube-system_f0ae51f10833bbd4d70ccb8690f2429c_55
822eec55d6af registry.aliyuncs.com/google_containers/pause:3.8 "/pause" 12 hours ago Up 12 hours k8s_POD_keepalived-vms102.liruilongs.github.io_kube-system_f0ae51f10833bbd4d70ccb8690f2429c_55

查看 apiserver 是否正常,执行命令实际上是调用的 kube-apiserver

1
2
3
┌──[root@vms102.liruilongs.github.io]-[~]
└─$docker ps |grep api
56807ccad104 registry.aliyuncs.com/google_containers/pause:3.8 "/pause" 12 hours ago Up 12 hours k8s_POD_kube-apiserver-vms102.liruilongs.github.io_kube-system_88f80934116e8f989883c8eba6636201_41

果然挂掉了,这里看下最后的日志

1
2
3
4
┌──[root@vms102.liruilongs.github.io]-[~]
└─$docker ps -a | grep api
c9bd413b176f b09a3dc327be "kube-apiserver --ad…" 2 minutes ago Exited (1) 2 minutes ago k8s_kube-apiserver_kube-apiserver-vms102.liruilongs.github.io_kube-system_88f80934116e8f989883c8eba6636201_225
56807ccad104 registry.aliyuncs.com/google_containers/pause:3.8 "/pause" 12 hours ago Up 12 hours k8s_POD_kube-apiserver-vms102.liruilongs.github.io_kube-system_88f80934116e8f989883c8eba6636201_41

日志显示加载准入控制器之后直接报错了,没有其他的提示。

1
2
3
4
5
6
7
8
9
10
11
┌──[root@vms102.liruilongs.github.io]-[~]
└─$docker logs --tail -5 c9bd413b176f
I0209 05:51:43.041043 1 server.go:563] external host was not specified, using 192.168.26.102
I0209 05:51:43.042642 1 server.go:161] Version: v1.25.1
I0209 05:51:43.042693 1 server.go:163] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
I0209 05:51:43.362808 1 shared_informer.go:255] Waiting for caches to sync for node_authorizer
I0209 05:51:43.363544 1 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
I0209 05:51:43.363560 1 plugins.go:161] Loaded 11 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
I0209 05:51:43.364480 1 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
I0209 05:51:43.364499 1 plugins.go:161] Loaded 11 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
E0209 05:52:03.366417 1 run.go:74] "command failed" err="context deadline exceeded"

kube-apiserver 需要和 etcd 不断的交互获取集群信息,更新集群信息,所以看一下 etcd

1
2
3
4
┌──[root@vms102.liruilongs.github.io]-[~]
└─$docker ps | grep etcd
43dccee957e0 a8a176a5d5d6 "etcd --advertise-cl…" About a minute ago Up About a minute k8s_etcd_etcd-vms102.liruilongs.github.io_kube-system_bb9615ff1be73c1b0c1f420f3da9806a_156
523a83b11288 registry.aliyuncs.com/google_containers/pause:3.8 "/pause" 12 hours ago Up 12 hours k8s_POD_etcd-vms102.liruilongs.github.io_kube-system_bb9615ff1be73c1b0c1f420f3da9806a_41

通过 etcd 的日志可以看到,证书相关警告,很大原因是证书过期了

1
2
3
4
5
6
7
8
9
10
┌──[root@vms102.liruilongs.github.io]-[~]
└─$docker logs 43dccee957e0 | tail -5
...................
{"level":"warn","ts":"2024-02-09T05:59:17.452Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.26.101:51158","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-02-09T05:59:17.452Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.26.101:51148","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-02-09T05:59:17.553Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.26.101:51166","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-02-09T05:59:17.553Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.26.101:51164","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-02-09T05:59:17.588Z","caller":"etcdhttp/metrics.go:173","msg":"serving /health false; no leader"}
{"level":"warn","ts":"2024-02-09T05:59:17.588Z","caller":"etcdhttp/metrics.go:86","msg":"/health error","output":"{\"health\":\"false\",\"reason\":\"RAFT NO LEADER\"}","status-code":503}

检查证书,发现确实过期了,1 月 26 到期,现在 2 月 8 号

1
2
3
4
┌──[root@vms102.liruilongs.github.io]-[~]
└─$openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -text | grep Not
Not Before: Jan 26 11:27:49 2023 GMT
Not After : Jan 26 11:30:26 2024 GMT

通过 kubeadm 工具再次检查

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
┌──[root@vms102.liruilongs.github.io]-[~]
└─$kubeadm certs check-expiration
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[check-expiration] Error reading configuration from the Cluster. Falling back to default configuration

CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED
admin.conf Jan 26, 2024 11:30 UTC <invalid> ca no
apiserver Jan 26, 2024 11:30 UTC <invalid> ca no
apiserver-etcd-client Jan 26, 2024 11:30 UTC <invalid> etcd-ca no
apiserver-kubelet-client Jan 26, 2024 11:30 UTC <invalid> ca no
controller-manager.conf Jan 26, 2024 11:30 UTC <invalid> ca no
etcd-healthcheck-client Jan 26, 2024 11:30 UTC <invalid> etcd-ca no
etcd-peer Jan 26, 2024 11:30 UTC <invalid> etcd-ca no
etcd-server Jan 26, 2024 11:30 UTC <invalid> etcd-ca no
front-proxy-client Jan 26, 2024 11:30 UTC <invalid> front-proxy-ca no
scheduler.conf Jan 26, 2024 11:30 UTC <invalid> ca no

CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
ca Jan 23, 2033 11:27 UTC 8y no
etcd-ca Jan 23, 2033 11:27 UTC 8y no
front-proxy-ca Jan 23, 2033 11:27 UTC 8y no
┌──[root@vms102.liruilongs.github.io]-[~]
└─$

如何解决的?

问题确定,解决就比较简单了,直接更新证书即可,需要注意 当前集群为 高可用 ,3 master 节点,所以所有的 master 节点需要更新。

先更新一个节点的

备份一下

1
2
┌──[root@vms102.liruilongs.github.io]-[~]
└─$cp -r /etc/kubernetes /etc/kubernetes.20240209.bak

kubeadm certs renew all 命令用于批量的证书续约

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
┌──[root@vms102.liruilongs.github.io]-[~]
└─$kubeadm certs renew all
[renew] Reading configuration from the cluster...
[renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[renew] Error reading configuration from the Cluster. Falling back to default configuration

certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed
certificate for serving the Kubernetes API renewed
certificate the apiserver uses to access etcd renewed
certificate for the API server to connect to kubelet renewed
certificate embedded in the kubeconfig file for the controller manager to use renewed
certificate for liveness probes to healthcheck etcd renewed
certificate for etcd nodes to communicate with each other renewed
certificate for serving etcd renewed
certificate for the front proxy client renewed
certificate embedded in the kubeconfig file for the scheduler manager to use renewed

Done renewing certificates. You must restart the kube-apiserver, kube-controller-manager, kube-scheduler and etcd, so that they can use the new certificates.

检查续约是否成功

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
┌──[root@vms102.liruilongs.github.io]-[~]
└─$kubeadm certs check-expiration
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[check-expiration] Error reading configuration from the Cluster. Falling back to default configuration

CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED
admin.conf Feb 08, 2025 06:18 UTC 364d ca no
apiserver Feb 08, 2025 06:18 UTC 364d ca no
apiserver-etcd-client Feb 08, 2025 06:18 UTC 364d etcd-ca no
apiserver-kubelet-client Feb 08, 2025 06:18 UTC 364d ca no
controller-manager.conf Feb 08, 2025 06:18 UTC 364d ca no
etcd-healthcheck-client Feb 08, 2025 06:18 UTC 364d etcd-ca no
etcd-peer Feb 08, 2025 06:18 UTC 364d etcd-ca no
etcd-server Feb 08, 2025 06:18 UTC 364d etcd-ca no
front-proxy-client Feb 08, 2025 06:18 UTC 364d front-proxy-ca no
scheduler.conf Feb 08, 2025 06:18 UTC 364d ca no

CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
ca Jan 23, 2033 11:27 UTC 8y no
etcd-ca Jan 23, 2033 11:27 UTC 8y no
front-proxy-ca Jan 23, 2033 11:27 UTC 8y no
┌──[root@vms102.liruilongs.github.io]-[~]
└─$

没问题之后,通过 ansible 批量操作

下面为清单文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$cat host.yaml
ansible:
children:
ansible_master:
hosts:
192.168.26.100:
ansible_node:
hosts:
192.168.26.[101:103]:
192.168.26.[105:106]:
k8s:
children:
k8s_master:
hosts:
192.168.26.[100:102]:
k8s_node:
hosts:
192.168.26.103:
192.168.26.[105:106]:

所有的 master 节点 批量续约,需要把之前的操作的节点排除掉

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$ansible k8s_master -m shell -a "kubeadm certs renew all" -i host.yaml --limit !192.168.26.102
ansible k8s_master -m shell -a "kubeadm certs renew all" -i host.yaml --limit kubectl get all -A -o wide | grep tidb-cluster | awk '{print $2}' | awk -F'/' '{ print "kubectl delete "$1" "$2 " -n tidb-cluster --force" }' | xargs -n1 -I{} bash -c "{}".168.26.102
usage: ansible [-h] [--version] [-v] [-b] [--become-method BECOME_METHOD]
[--become-user BECOME_USER] [-K] [-i INVENTORY] [--list-hosts]
[-l SUBSET] [-P POLL_INTERVAL] [-B SECONDS] [-o] [-t TREE] [-k]
[--private-key PRIVATE_KEY_FILE] [-u REMOTE_USER]
[-c CONNECTION] [-T TIMEOUT]
[--ssh-common-args SSH_COMMON_ARGS]
[--sftp-extra-args SFTP_EXTRA_ARGS]
[--scp-extra-args SCP_EXTRA_ARGS]
[--ssh-extra-args SSH_EXTRA_ARGS] [-C] [--syntax-check] [-D]
[-e EXTRA_VARS] [--vault-id VAULT_IDS]
[--ask-vault-pass | --vault-password-file VAULT_PASSWORD_FILES]
[-f FORKS] [-M MODULE_PATH] [--playbook-dir BASEDIR]
[-a MODULE_ARGS] [-m MODULE_NAME]
pattern
ansible: error: unrecognized arguments: get all -A wide
┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$ansible k8s_master -m shell -a "kubeadm certs renew all" -i host.yaml --limit "!192.168.26.102"
ansible k8s_master -m shell -a "kubeadm certs renew all" -i host.yaml --limit "kubectl get all -A -o wide | grep tidb-cluster | awk '{print $2}' | awk -F'/' '{ print "kubectl delete "$1" "$2 " -n tidb-cluster --force" }' | xargs -n1 -I{} bash -c "{}".168.26.102"
usage: ansible [-h] [--version] [-v] [-b] [--become-method BECOME_METHOD]
[--become-user BECOME_USER] [-K] [-i INVENTORY] [--list-hosts]
[-l SUBSET] [-P POLL_INTERVAL] [-B SECONDS] [-o] [-t TREE] [-k]
[--private-key PRIVATE_KEY_FILE] [-u REMOTE_USER]
[-c CONNECTION] [-T TIMEOUT]
[--ssh-common-args SSH_COMMON_ARGS]
[--sftp-extra-args SFTP_EXTRA_ARGS]
[--scp-extra-args SCP_EXTRA_ARGS]
[--ssh-extra-args SSH_EXTRA_ARGS] [-C] [--syntax-check] [-D]
[-e EXTRA_VARS] [--vault-id VAULT_IDS]
[--ask-vault-pass | --vault-password-file VAULT_PASSWORD_FILES]
[-f FORKS] [-M MODULE_PATH] [--playbook-dir BASEDIR]
[-a MODULE_ARGS] [-m MODULE_NAME]
pattern
ansible: error: unrecognized arguments: delete -n tidb-cluster --force }' | xargs -n1 -I{} bash -c {}.168.26.102

报错了,!192.168.26.102 是一个特殊命令,所以我们添加 引号试试,添加单引号可以正常运行,其他节点续约完成

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$ansible k8s_master -m shell -a "kubeadm certs renew all" -i host.yaml --limit '!192.168.26.102'
192.168.26.101 | CHANGED | rc=0 >>
[renew] Reading configuration from the cluster...
[renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[renew] Error reading configuration from the Cluster. Falling back to default configuration

certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed
certificate for serving the Kubernetes API renewed
certificate the apiserver uses to access etcd renewed
certificate for the API server to connect to kubelet renewed
certificate embedded in the kubeconfig file for the controller manager to use renewed
certificate for liveness probes to healthcheck etcd renewed
certificate for etcd nodes to communicate with each other renewed
certificate for serving etcd renewed
certificate for the front proxy client renewed
certificate embedded in the kubeconfig file for the scheduler manager to use renewed

Done renewing certificates. You must restart the kube-apiserver, kube-controller-manager, kube-scheduler and etcd, so that they can use the new certificates.
192.168.26.100 | CHANGED | rc=0 >>
[renew] Reading configuration from the cluster...
[renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[renew] Error reading configuration from the Cluster. Falling back to default configuration

certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed
certificate for serving the Kubernetes API renewed
certificate the apiserver uses to access etcd renewed
certificate for the API server to connect to kubelet renewed
certificate embedded in the kubeconfig file for the controller manager to use renewed
certificate for liveness probes to healthcheck etcd renewed
certificate for etcd nodes to communicate with each other renewed
certificate for serving etcd renewed
certificate for the front proxy client renewed
certificate embedded in the kubeconfig file for the scheduler manager to use renewed

Done renewing certificates. You must restart the kube-apiserver, kube-controller-manager, kube-scheduler and etcd, so that they can use the new certificates.
┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$

这里我们重启 docker ,正常重启 静态pod 就可以,如果当前为生产集群,考虑晚上重启 容器运行时,或者移动 静态Pod 对应的 yaml 文件,默认kubelet 会每个一段时间重新扫描对应的目录的yaml 文件

重启 docker 注意这里的 --forks 1,序列化运行,每次一个节点运行

1
2
3
4
5
6
7
┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$ansible k8s_master -m shell -a "systemctl restart docker" -i host.yaml --forks 1
192.168.26.100 | CHANGED | rc=0 >>

192.168.26.101 | CHANGED | rc=0 >>

192.168.26.102 | CHANGED | rc=0 >>

运行 kubectl 命令测试

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$ansible k8s_master -m shell -a "kubectl get nodes --kubeconfig /etc/kubernetes/admin.conf" -i host.yaml
192.168.26.100 | CHANGED | rc=0 >>
NAME STATUS ROLES AGE VERSION
vms100.liruilongs.github.io Ready control-plane 378d v1.25.1
vms101.liruilongs.github.io Ready control-plane 378d v1.25.1
vms102.liruilongs.github.io Ready control-plane 378d v1.25.1
vms103.liruilongs.github.io Ready <none> 378d v1.25.1
vms105.liruilongs.github.io Ready <none> 378d v1.25.1
vms106.liruilongs.github.io Ready <none> 378d v1.25.1
192.168.26.102 | CHANGED | rc=0 >>
NAME STATUS ROLES AGE VERSION
vms100.liruilongs.github.io Ready control-plane 378d v1.25.1
vms101.liruilongs.github.io Ready control-plane 378d v1.25.1
vms102.liruilongs.github.io Ready control-plane 378d v1.25.1
vms103.liruilongs.github.io Ready <none> 378d v1.25.1
vms105.liruilongs.github.io Ready <none> 378d v1.25.1
vms106.liruilongs.github.io Ready <none> 378d v1.25.1
192.168.26.101 | CHANGED | rc=0 >>
NAME STATUS ROLES AGE VERSION
vms100.liruilongs.github.io Ready control-plane 378d v1.25.1
vms101.liruilongs.github.io Ready control-plane 378d v1.25.1
vms102.liruilongs.github.io Ready control-plane 378d v1.25.1
vms103.liruilongs.github.io Ready <none> 378d v1.25.1
vms105.liruilongs.github.io Ready <none> 378d v1.25.1
vms106.liruilongs.github.io Ready <none> 378d v1.25.1
┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$

确实没问题后,拷贝证书到默认加载位置,或者配置环境变量

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$ansible k8s_master -m copy -a "src=/etc/kubernetes/admin.conf dest=/root/.kube/config" -i host.yaml
192.168.26.101 | CHANGED => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python"
},
"changed": true,
"checksum": "c58460352ef70350a39a4fc6b01645ed68cf56dc",
"dest": "/root/.kube/config",
"gid": 0,
"group": "root",
"md5sum": "470ad5691e98e2dd5682186c64cc5d33",
"mode": "0600",
"owner": "root",
"size": 5674,
"src": "/root/.ansible/tmp/ansible-tmp-1707464341.43-44557-35016830998762/source",
"state": "file",
"uid": 0
}
192.168.26.100 | CHANGED => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python"
},
"changed": true,
"checksum": "c58460352ef70350a39a4fc6b01645ed68cf56dc",
"dest": "/root/.kube/config",
"gid": 0,
"group": "root",
"md5sum": "470ad5691e98e2dd5682186c64cc5d33",
"mode": "0600",
"owner": "root",
"size": 5674,
"src": "/root/.ansible/tmp/ansible-tmp-1707464341.41-44555-140261297562614/source",
"state": "file",
"uid": 0
}
192.168.26.102 | CHANGED => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python"
},
"changed": true,
"checksum": "c58460352ef70350a39a4fc6b01645ed68cf56dc",
"dest": "/root/.kube/config",
"gid": 0,
"group": "root",
"md5sum": "470ad5691e98e2dd5682186c64cc5d33",
"mode": "0600",
"owner": "root",
"size": 5674,
"src": "/root/.ansible/tmp/ansible-tmp-1707464341.39-44559-184122506106441/source",
"state": "file",
"uid": 0
}

在次测试,集群恢复正常

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$ansible k8s_master -m shell -a "kubectl get nodes " -i host.yaml
192.168.26.101 | CHANGED | rc=0 >>
NAME STATUS ROLES AGE VERSION
vms100.liruilongs.github.io Ready control-plane 378d v1.25.1
vms101.liruilongs.github.io Ready control-plane 378d v1.25.1
vms102.liruilongs.github.io Ready control-plane 378d v1.25.1
vms103.liruilongs.github.io Ready <none> 378d v1.25.1
vms105.liruilongs.github.io Ready <none> 378d v1.25.1
vms106.liruilongs.github.io Ready <none> 378d v1.25.1
192.168.26.100 | CHANGED | rc=0 >>
NAME STATUS ROLES AGE VERSION
vms100.liruilongs.github.io Ready control-plane 378d v1.25.1
vms101.liruilongs.github.io Ready control-plane 378d v1.25.1
vms102.liruilongs.github.io Ready control-plane 378d v1.25.1
vms103.liruilongs.github.io Ready <none> 378d v1.25.1
vms105.liruilongs.github.io Ready <none> 378d v1.25.1
vms106.liruilongs.github.io Ready <none> 378d v1.25.1
192.168.26.102 | CHANGED | rc=0 >>
NAME STATUS ROLES AGE VERSION
vms100.liruilongs.github.io Ready control-plane 378d v1.25.1
vms101.liruilongs.github.io Ready control-plane 378d v1.25.1
vms102.liruilongs.github.io Ready control-plane 378d v1.25.1
vms103.liruilongs.github.io Ready <none> 378d v1.25.1
vms105.liruilongs.github.io Ready <none> 378d v1.25.1
vms106.liruilongs.github.io Ready <none> 378d v1.25.1
┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$

博文部分内容参考

© 文中涉及参考链接内容版权归原作者所有,如有侵权请告知 :)


https://blog.csdn.net/sanhewuyang/article/details/128436670


© 2018-2024 liruilonger@gmail.com, All rights reserved. 保持署名-非商用-相同方式共享(CC BY-NC-SA 4.0)

K8s集群故障(The connection to the server <host>:<port> was refused - did you specify the right host or port)解决

https://liruilongs.github.io/2024/02/09/K8s/环境部署-运维/K8s集群故障(The connection to the server was refused - did you specify the right host or port)解决/

发布于

2024-02-09

更新于

2024-02-15

许可协议

评论
Your browser is out-of-date!

Update your browser to view this website correctly.&npsb;Update my browser now

×