关于 Kubernetes中Node扩容、隔离、恢复的一些笔记

傍晚时分,你坐在屋檐下,看着天慢慢地黑下去,心里寂寞而凄凉,感到自己的生命被剥夺了。当时我是个年轻人,但我害怕这样生活下去,衰老下去。在我看来,这是比死亡更可怕的事。——–王小波

写在前面


  • 分享一些K8s中Node扩容、隔离、恢复的笔记
  • 博文主要是通过 kubeadm做节点扩容的一个Demo
  • 理解不足小伙伴帮忙指正

傍晚时分,你坐在屋檐下,看着天慢慢地黑下去,心里寂寞而凄凉,感到自己的生命被剥夺了。当时我是个年轻人,但我害怕这样生活下去,衰老下去。在我看来,这是比死亡更可怕的事。——–王小波


扩容

在使用 k8s的过程中,当现有节点不足以支撑业务时,比如多实例导致的端口冲突等因素,考虑对节点进行扩容。添加工作节点到集群。

在Kubernetes集群中,一个新Node的加入。如果使用 kubeadm 的方式,感觉和新建节点的时候基本一致,一个 node 节点,机器上实际跑的 Service只有 docker 和 kubelet,其他的比如 kube-proxy,网络相关等都是通过容器的方式。

下面为当前环境,192.168.26.156 是最开始扩容测试加入的机器。

1
2
3
4
5
6
7
8
9
┌──[root@vms81.liruilongs.github.io]-[~/ansible]
└─$kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
vms156.liruilongs.github.io Ready <none> 35m v1.22.2 192.168.26.156 <none> CentOS Linux 7 (Core) 3.10.0-693.el7.x86_64 docker://20.10.21
vms81.liruilongs.github.io Ready control-plane,master 324d v1.22.2 192.168.26.81 <none> CentOS Linux 7 (Core) 3.10.0-1160.76.1.el7.x86_64 docker://20.10.9
vms82.liruilongs.github.io Ready <none> 324d v1.22.2 192.168.26.82 <none> CentOS Linux 7 (Core) 3.10.0-693.el7.x86_64 docker://20.10.9
vms83.liruilongs.github.io Ready <none> 324d v1.22.2 192.168.26.83 <none> CentOS Linux 7 (Core) 3.10.0-693.el7.x86_64 docker://20.10.9
┌──[root@vms81.liruilongs.github.io]-[~]
└─$

现在要在当前集群添加 192.168.26.155 这台机器。需要做下面一些步骤

需要配置SSH免密,这不是必须,这些使用了 Ansible ,所以配置

1
2
┌──[root@vms81.liruilongs.github.io]-[~/ansible]
└─$ssh-copy-id root@192.168.26.155

然后通过 Ansible 做一些节点的初始化操作

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
┌──[root@vms81.liruilongs.github.io]-[~/ansible]
└─$cat init_k8s_node.yaml
- name: init k8s
hosts: 192.168.26.155
tasks:
# 关闭防火墙
- shell: firewall-cmd --set-default-zone=trusted
# 关闭selinux
- shell: getenforce
register: out
- debug: msg="{{out}}"
- shell: setenforce 0
when: out.stdout != "Disabled"
- replace:
path: /etc/selinux/config
regexp: "SELINUX=enforcing"
replace: "SELINUX=disabled"
- shell: cat /etc/selinux/config
register: out
- debug: msg="{{out}}"
# 关闭交换分区
- shell: swapoff -a
- shell: sed -i '/swap/d' /etc/fstab
- shell: cat /etc/fstab
register: out
- debug: msg="{{out}}"
# 安装docker-ce
- yum:
name: docker-ce
state: present
# 配置docker加速
- shell: mkdir /etc/docker
- copy:
src: ./daemon.json
dest: /etc/docker/daemon.json
- shell: systemctl daemon-reload
- shell: systemctl restart docker
# 配置需要修改的内核参数
- copy:
src: ./k8s.conf
dest: /etc/sysctl.d/k8s.conf
┌──[root@vms81.liruilongs.github.io]-[~/ansible]
└─$

上面的剧本里有一些配置文件copy需要注意

内核参数修改

1
2
3
4
5
┌──[root@vms81.liruilongs.github.io]-[~/ansible]
└─$cat k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1

镜像加速文件

1
2
3
4
5
6
7
┌──[root@vms81.liruilongs.github.io]-[~/ansible]
└─$cat daemon.json
{
"registry-mirrors": ["https://2tefyfv7.mirror.aliyuncs.com"]
}
┌──[root@vms81.liruilongs.github.io]-[~/ansible]
└─$

清单文件,如果扩容多个节点,需要修改

1
2
3
┌──[root@vms81.liruilongs.github.io]-[~/ansible]
└─$cat node-host
192.168.26.155

然后直接执行就可以

1
2
┌──[root@vms81.liruilongs.github.io]-[~/ansible]
└─$ansible-playbook init_k8s_node.yaml -i node-host

初始化环境完成折后需要配置yum源,k8s 的yum 源,其实这个可以放到剧本里

1
2
3
4
5
6
7
8
9
┌──[root@vms155.liruilongs.github.io]-[/etc/yum.repos.d]
└─$cat k8s.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg

安装需要的服务,这里要注意和集群版本一定要一致,这一步也可以放到剧本里

1
2
┌──[root@vms156.liruilongs.github.io]-[/etc/yum.repos.d]
└─$yum install -y kubelet-1.22.2-0 kubeadm-1.22.2-0 kubectl-1.22.2-0 --disableexcludes=kubernetes

docke需要修改资源管理驱动为systemd,不然会报下面的问题,当然这个也可以写到剧本里
err="failed to run Kubelet: misconfiguration: kubelet cgroup driver: \"systemd\" is different from docker cgroup driver: \"cgroupfs\""

1
2
3
4
┌──[root@vms81.liruilongs.github.io]-[~/ansible]
└─$ansible all -m shell -a "sed -i '3i ,\"exec-opts\": [\"native.cgroupdriver=systemd\"]' /etc/docker/daemon.json" -i node-host

192.168.26.155 | CHANGED | rc=0 >>

restart docke,当然也可以 reload 之后 start

1
2
3
┌──[root@vms81.liruilongs.github.io]-[~/ansible]
└─$ansible all -m shell -a 'systemctl restart docker' -i node-host
192.168.26.155 | CHANGED | rc=0 >>

加入节点之前,需要在master创建一个 token

1
2
3
┌──[root@vms81.liruilongs.github.io]-[~]
└─$kubeadm token create --print-join-command
kubeadm join 192.168.26.81:6443 --token vmya1o.xprnhn8ub6wzzb2e --discovery-token-ca-cert-hash sha256:2e17952177d9c633254e6941849885fc8e0e16dde805425effa22ed04415e7d4

复制命令,在需要加入的节点执行,这里会通过 kubeadm 把节点上kubelet 需要的配置文件生成,注册到 master 上,如果不执行直接启动 ,节点的 kubelet 是无法启动的,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
┌──[root@vms81.liruilongs.github.io]-[~/ansible]
└─$ansible all -m shell -a 'kubeadm join 192.168.26.81:6443 --token vmya1o.xprnhn8ub6wzzb2e --discovery-token-ca-cert-hash sha256:2e17952177d9c633254e6941849885fc8e0e16dde805425effa22ed04415e7d4' -i node-host

192.168.26.155 | CHANGED | rc=0 >>
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster. [WARNING Service-Docker]: docker service is not enabled, please run 'systemctl enable docker.service'
[WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service'

根据提示,处理下警告

1
2
3
4
5
6
7
┌──[root@vms81.liruilongs.github.io]-[~/ansible]
└─$ansible all -m shell -a 'systemctl enable docker.service --now;systemctl enable kubelet.service --now' -i node-host
192.168.26.155 | CHANGED | rc=0 >>
Created symlink from /etc/systemd/system/multi-user.target.wants/docker.service to /usr/lib/systemd/system/docker.service.
Created symlink from /etc/systemd/system/multi-user.target.wants/kubelet.service to /usr/lib/systemd/system/kubelet.service.
┌──[root@vms81.liruilongs.github.io]-[~/ansible]
└─$

master 查看节点状态,vms155.liruilongs.github.io Ready

1
2
3
4
5
6
7
8
9
10
┌──[root@vms81.liruilongs.github.io]-[~/ansible]
└─$kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
vms155.liruilongs.github.io Ready <none> 7m19s v1.22.2 192.168.26.155 <none> CentOS Linux 7 (Core) 3.10.0-693.el7.x86_64 docker://20.10.21
vms156.liruilongs.github.io Ready <none> 66m v1.22.2 192.168.26.156 <none> CentOS Linux 7 (Core) 3.10.0-693.el7.x86_64 docker://20.10.21
vms81.liruilongs.github.io Ready control-plane,master 324d v1.22.2 192.168.26.81 <none> CentOS Linux 7 (Core) 3.10.0-1160.76.1.el7.x86_64 docker://20.10.9
vms82.liruilongs.github.io Ready <none> 324d v1.22.2 192.168.26.82 <none> CentOS Linux 7 (Core) 3.10.0-693.el7.x86_64 docker://20.10.9
vms83.liruilongs.github.io Ready <none> 324d v1.22.2 192.168.26.83 <none> CentOS Linux 7 (Core) 3.10.0-693.el7.x86_64 docker://20.10.9
┌──[root@vms81.liruilongs.github.io]-[~/ansible]
└─$

可以对比下 原来节点 静态 pod 是否一致

1
2
3
4
5
6
7
8
9
10
11
12
┌──[root@vms81.liruilongs.github.io]-[~/ansible]
└─$kubectl get pods -A -owide | grep '192.168.26.83'
kube-system calico-node-fv458 1/1 Running 91 (2d23h ago) 324d 192.168.26.83 vms83.liruilongs.github.io <none> <none>
kube-system kube-proxy-xccmp 1/1 Running 23 (2d23h ago) 324d 192.168.26.83 vms83.liruilongs.github.io <none> <none>
metallb-system speaker-bbl94 1/1 Running 66 (2d23h ago) 315d 192.168.26.83 vms83.liruilongs.github.io <none> <none>
┌──[root@vms81.liruilongs.github.io]-[~/ansible]
└─$kubectl get pods -A -owide | grep '192.168.26.155'
kube-system calico-node-vxpxt 1/1 Running 0 117m 192.168.26.155 vms155.liruilongs.github.io <none> <none>
kube-system kube-proxy-htg7t 1/1 Running 0 117m 192.168.26.155 vms155.liruilongs.github.io <none> <none>
metallb-system speaker-6mwfj 0/1 CrashLoopBackOff 27 (3m10s ago) 117m 192.168.26.155 vms155.liruilongs.github.io <none> <none>
┌──[root@vms81.liruilongs.github.io]-[~/ansible]
└─$

隔离恢复

有时候可能需要下线某个节点,做一个硬件方面的处理,这个时候,就需要隔离节点。

k8s的隔离可以通过节点的 drain实现,如果一个节点被设置为drain,则此节点不再被调度pod,且此节点上已经运行的pod会被驱逐(evicted)到其他节点,当然 daemonsets 不会,如果也驱逐,那没有任何意义。

drain 包含两种状态:cordon不可被调度,evicted驱逐当前节点所以pod

这里的--ignore-daemonsets 用户忽略守护set

1
2
3
4
5
┌──[root@vms81.liruilongs.github.io]-[~/ansible]
└─$kubectl drain vms155.liruilongs.github.io --ignore-daemonsets
node/vms155.liruilongs.github.io cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-vxpxt, kube-system/kube-proxy-htg7t, metallb-system/speaker-6mwfj
node/vms155.liruilongs.github.io drained

查看节点状态,为SchedulingDisabled,以为已经被 drain

1
2
3
4
5
6
7
8
┌──[root@vms81.liruilongs.github.io]-[~/ansible]
└─$kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
vms155.liruilongs.github.io Ready,SchedulingDisabled <none> 19m v1.22.2 192.168.26.155 <none> CentOS Linux 7 (Core) 3.10.0-693.el7.x86_64 docker://20.10.21
vms156.liruilongs.github.io Ready <none> 78m v1.22.2 192.168.26.156 <none> CentOS Linux 7 (Core) 3.10.0-693.el7.x86_64 docker://20.10.21
vms81.liruilongs.github.io Ready control-plane,master 324d v1.22.2 192.168.26.81 <none> CentOS Linux 7 (Core) 3.10.0-1160.76.1.el7.x86_64 docker://20.10.9
vms82.liruilongs.github.io Ready <none> 324d v1.22.2 192.168.26.82 <none> CentOS Linux 7 (Core) 3.10.0-693.el7.x86_64 docker://20.10.9
vms83.liruilongs.github.io Ready <none> 324d v1.22.2 192.168.26.83 <none> CentOS Linux 7 (Core) 3.10.0-693.el7.x86_64 docker://20.10.9

处理完相关的事情之后,可以通过 uncordon 来恢复节点。

1
2
3
4
5
6
7
8
9
10
11
12
13
┌──[root@vms81.liruilongs.github.io]-[~/ansible]
└─$kubectl uncordon vms155.liruilongs.github.io
node/vms155.liruilongs.github.io uncordoned
┌──[root@vms81.liruilongs.github.io]-[~/ansible]
└─$kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
vms155.liruilongs.github.io Ready <none> 20m v1.22.2 192.168.26.155 <none> CentOS Linux 7 (Core) 3.10.0-693.el7.x86_64 docker://20.10.21
vms156.liruilongs.github.io Ready <none> 79m v1.22.2 192.168.26.156 <none> CentOS Linux 7 (Core) 3.10.0-693.el7.x86_64 docker://20.10.21
vms81.liruilongs.github.io Ready control-plane,master 324d v1.22.2 192.168.26.81 <none> CentOS Linux 7 (Core) 3.10.0-1160.76.1.el7.x86_64 docker://20.10.9
vms82.liruilongs.github.io Ready <none> 324d v1.22.2 192.168.26.82 <none> CentOS Linux 7 (Core) 3.10.0-693.el7.x86_64 docker://20.10.9
vms83.liruilongs.github.io Ready <none> 324d v1.22.2 192.168.26.83 <none> CentOS Linux 7 (Core) 3.10.0-693.el7.x86_64 docker://20.10.9
┌──[root@vms81.liruilongs.github.io]-[~/ansible]
└─$
发布于

2022-11-01

更新于

2023-06-21

许可协议

评论
Your browser is out-of-date!

Update your browser to view this website correctly.&npsb;Update my browser now

×