Kubernetes集群监控解决方案kube-prometheus-stack(prometheus-operator)helm安装

人生不尽美好,追求自我注定孤独,而这就是生命的意义 —— 黑塞《彼得卡门青》

写在前面


  • 学习k8s监控涉及到
  • 网上的教程大都不全或者有些旧,所以整理分享给小伙伴。
  • 博文内容为 k8s集群通过helm方式创建kube-prometheus-stack监控平台教程
  • 折腾了一晚上,搞定了,一开始一直用prometheus-operator这个chart来装,报错各种找问题,后来才发现我的集群版本太高了,1.22的版本,而且 prometheus-operator 之后的版本改变了名字kube-prometheus-stack,旧的版本可能不兼容。

人生不尽美好,追求自我注定孤独,而这就是生命的意义 —— 黑塞《彼得卡门青》


环境版本

我的K8s集群版本

1
2
3
4
5
6
┌──[root@vms81.liruilongs.github.io]-[~/ansible/k8s-helm-create]
└─$kubectl get nodes
NAME STATUS ROLES AGE VERSION
vms81.liruilongs.github.io Ready control-plane,master 34d v1.22.2
vms82.liruilongs.github.io Ready <none> 34d v1.22.2
vms83.liruilongs.github.io Ready <none> 34d v1.22.2

hrlm版本

1
2
3
┌──[root@vms81.liruilongs.github.io]-[~/ansible/k8s-helm-create]
└─$helm version
version.BuildInfo{Version:"v3.2.1", GitCommit:"fe51cd1e31e6a202cba7dead9552a6d418ded79a", GitTreeState:"clean", GoVersion:"go1.13.10"}

prometheus-operator(旧名字)安装出现的问题

1
2
3
4
5
6
7
8
9
10
11
12
┌──[root@vms81.liruilongs.github.io]-[~/ansible/k8s-helm-create]
└─$helm search repo prometheus-operator
NAME CHART VERSION APP VERSION DESCRIPTION
ali/prometheus-operator 8.7.0 0.35.0 Provides easy monitoring definitions for Kubern...
azure/prometheus-operator 9.3.2 0.38.1 DEPRECATED Provides easy monitoring definitions...
┌──[root@vms81.liruilongs.github.io]-[~/ansible/k8s-helm-create]
└─$helm install liruilong ali/prometheus-operator
Error: failed to install CRD crds/crd-alertmanager.yaml: unable to recognize "": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1beta1"
┌──[root@vms81.liruilongs.github.io]-[~/ansible/k8s-helm-create]
└─$helm pull ali/prometheus-operator
┌──[root@vms81.liruilongs.github.io]-[~/ansible/k8s-helm-create]
└─$

解决办法:新版本安装

直接下载kube-prometheus-stack(新)的chart包,通过命令安装:

https://github.com/prometheus-community/helm-charts/releases/download/kube-prometheus-stack-30.0.1/kube-prometheus-stack-30.0.1.tgz

1
2
3
4
5
6
┌──[root@vms81.liruilongs.github.io]-[~/ansible/k8s-helm-create]
└─$ls
index.yaml kube-prometheus-stack-30.0.1.tgz liruilonghelm liruilonghelm-0.1.0.tgz mysql mysql-1.6.4.tgz
┌──[root@vms81.liruilongs.github.io]-[~/ansible/k8s-helm-create]
└─$helm list
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION

解压chart包kube-prometheus-stack-30.0.1.tgz

1
2
┌──[root@vms81.liruilongs.github.io]-[~/ansible/k8s-helm-create]
└─$tar -zxf kube-prometheus-stack-30.0.1.tgz

创建新的命名空间

1
2
3
4
5
6
7
8
┌──[root@vms81.liruilongs.github.io]-[~/ansible/k8s-helm-create]
└─$cd kube-prometheus-stack/
┌──[root@vms81.liruilongs.github.io]-[~/ansible/k8s-helm-create/kube-prometheus-stack]
└─$kubectl create ns monitoring
namespace/monitoring created
┌──[root@vms81.liruilongs.github.io]-[~/ansible/k8s-helm-create/kube-prometheus-stack]
└─$kubectl config set-context $(kubectl config current-context) --namespace=monitoring
Context "kubernetes-admin@kubernetes" modified.

进入文件夹,直接通过helm install liruilong .安装

1
2
3
4
5
┌──[root@vms81.liruilongs.github.io]-[~/ansible/k8s-helm-create/kube-prometheus-stack]
└─$ls
Chart.lock charts Chart.yaml CONTRIBUTING.md crds README.md templates values.yaml
┌──[root@vms81.liruilongs.github.io]-[~/ansible/k8s-helm-create/kube-prometheus-stack]
└─$helm install liruilong .

kube-prometheus-admission-create对应Pod的相关镜像下载不下来问题

1
2
3
4
┌──[root@vms81.liruilongs.github.io]-[~/ansible/k8s-helm-create]
└─$kubectl get pods
NAME READY STATUS RESTARTS AGE
liruilong-kube-prometheus-admission-create--1-bn7x2 0/1 ImagePullBackOff 0 33s

查看pod详细信息,发现是谷歌的一个镜像国内无法下载

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
┌──[root@vms81.liruilongs.github.io]-[~/ansible/k8s-helm-create]
└─$kubectl describe pod liruilong-kube-prometheus-admission-create--1-bn7x2
Name: liruilong-kube-prometheus-admission-create--1-bn7x2
Namespace: monitoring
Priority: 0
Node: vms83.liruilongs.github.io/192.168.26.83
Start Time: Sun, 16 Jan 2022 02:43:07 +0800
Labels: app=kube-prometheus-stack-admission-create
app.kubernetes.io/instance=liruilong
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/part-of=kube-prometheus-stack
app.kubernetes.io/version=30.0.1
chart=kube-prometheus-stack-30.0.1
controller-uid=2ce48cd2-a118-4e23-a27f-0228ef6c45e7
heritage=Helm
job-name=liruilong-kube-prometheus-admission-create
release=liruilong
Annotations: cni.projectcalico.org/podIP: 10.244.70.8/32
cni.projectcalico.org/podIPs: 10.244.70.8/32
Status: Pending
IP: 10.244.70.8
IPs:
IP: 10.244.70.8
Controlled By: Job/liruilong-kube-prometheus-admission-create
Containers:
create:
Container ID:
Image: k8s.gcr.io/ingress-nginx/kube-webhook-certgen:v1.0@sha256:f3b6b39a6062328c095337b4cadcefd1612348fdd5190b1dcbcb9b9e90bd8068
Image ID:
Port: <none>
Host Port:
。。。。。。。。。。。。。。。。。。。。。。。。。。。

在dokcer仓库里找了一个类似的,通过 kubectl edit 修改

1
image: k8s.gcr.io/ingress-nginx/kube-webhook-certgen:v1.0  替换为 : docker.io/liangjw/kube-webhook-certgen:v1.1.1

或者也可以修改配置文件从新install(记得要把sha注释掉)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
┌──[root@vms81.liruilongs.github.io]-[~/ansible/k8s-helm-create]
└─$ls
index.yaml kube-prometheus-stack kube-prometheus-stack-30.0.1.tgz liruilonghelm liruilonghelm-0.1.0.tgz mysql mysql-1.6.4.tgz
┌──[root@vms81.liruilongs.github.io]-[~/ansible/k8s-helm-create]
└─$cd kube-prometheus-stack/
┌──[root@vms81.liruilongs.github.io]-[~/ansible/k8s-helm-create/kube-prometheus-stack]
└─$ls
Chart.lock charts Chart.yaml CONTRIBUTING.md crds README.md templates values.yaml
┌──[root@vms81.liruilongs.github.io]-[~/ansible/k8s-helm-create/kube-prometheus-stack]
└─$cat values.yaml | grep -A 3 -B 2 kube-webhook-certgen
enabled: true
image:
repository: docker.io/liangjw/kube-webhook-certgen
tag: v1.1.1
#sha: "f3b6b39a6062328c095337b4cadcefd1612348fdd5190b1dcbcb9b9e90bd8068"
pullPolicy: IfNotPresent
┌──[root@vms81.liruilongs.github.io]-[~/ansible/k8s-helm-create/kube-prometheus-stack]
└─$
┌──[root@vms81.liruilongs.github.io]-[~/ansible/k8s-helm-create/kube-prometheus-stack]
└─$helm del liruilong;helm install liruilong .

之后其他的相关pod正常创建中

1
2
3
4
5
6
7
8
9
┌──[root@vms81.liruilongs.github.io]-[~/ansible/k8s-helm-create/kube-prometheus-stack]
└─$kubectl get pods
NAME READY STATUS RESTARTS AGE
liruilong-grafana-5955564c75-zpbjq 0/3 ContainerCreating 0 27s
liruilong-kube-prometheus-operator-5cb699b469-fbkw5 0/1 ContainerCreating 0 27s
liruilong-kube-state-metrics-5dcf758c47-bbwt4 0/1 ContainerCreating 0 27s
liruilong-prometheus-node-exporter-rfsc5 0/1 ContainerCreating 0 28s
liruilong-prometheus-node-exporter-vm7s9 0/1 ContainerCreating 0 28s
liruilong-prometheus-node-exporter-z9j8b 0/1 ContainerCreating 0 28s

kube-state-metrics这个pod的镜像也没有拉取下来。应该也是相同的原因

1
2
3
4
5
6
7
8
9
10
11
┌──[root@vms81.liruilongs.github.io]-[~/ansible/k8s-helm-create/kube-prometheus-stack]
└─$kubectl get pods
NAME READY STATUS RESTARTS AGE
alertmanager-liruilong-kube-prometheus-alertmanager-0 2/2 Running 0 3m35s
liruilong-grafana-5955564c75-zpbjq 3/3 Running 0 4m46s
liruilong-kube-prometheus-operator-5cb699b469-fbkw5 1/1 Running 0 4m46s
liruilong-kube-state-metrics-5dcf758c47-bbwt4 0/1 ImagePullBackOff 0 4m46s
liruilong-prometheus-node-exporter-rfsc5 1/1 Running 0 4m47s
liruilong-prometheus-node-exporter-vm7s9 1/1 Running 0 4m47s
liruilong-prometheus-node-exporter-z9j8b 1/1 Running 0 4m47s
prometheus-liruilong-kube-prometheus-prometheus-0 2/2 Running 0 3m34s

同样 k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.3.0 这个镜像没办法拉取

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
┌──[root@vms81.liruilongs.github.io]-[~/ansible/k8s-helm-create/kube-prometheus-stack]
└─$kubectl describe pod liruilong-kube-state-metrics-5dcf758c47-bbwt4
Name: liruilong-kube-state-metrics-5dcf758c47-bbwt4
Namespace: monitoring
Priority: 0
Node: vms82.liruilongs.github.io/192.168.26.82
Start Time: Sun, 16 Jan 2022 02:59:53 +0800
Labels: app.kubernetes.io/component=metrics
app.kubernetes.io/instance=liruilong
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=kube-state-metrics
app.kubernetes.io/part-of=kube-state-metrics
app.kubernetes.io/version=2.3.0
helm.sh/chart=kube-state-metrics-4.3.0
pod-template-hash=5dcf758c47
release=liruilong
Annotations: cni.projectcalico.org/podIP: 10.244.171.153/32
cni.projectcalico.org/podIPs: 10.244.171.153/32
Status: Pending
IP: 10.244.171.153
IPs:
IP: 10.244.171.153
Controlled By: ReplicaSet/liruilong-kube-state-metrics-5dcf758c47
Containers:
kube-state-metrics:
Container ID:
Image: k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.3.0
Image ID:
Port: 8080/TCP
。。。。。。。。。。。。。。。。。。。。。。

同样的,我们通过docker仓库找一下相同的,然后通过kubectl edit pod修改一下

1
k8s.gcr.io/kube-state-metrics/kube-state-metrics 替换为: docker.io/dyrnq/kube-state-metrics:v2.3.0

可以先在节点机上拉取一下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
┌──[root@vms81.liruilongs.github.io]-[~/ansible]
└─$ ansible node -m shell -a "docker pull dyrnq/kube-state-metrics:v2.3.0"
192.168.26.82 | CHANGED | rc=0 >>
v2.3.0: Pulling from dyrnq/kube-state-metrics
e8614d09b7be: Pulling fs layer
53ccb90bafd7: Pulling fs layer
e8614d09b7be: Verifying Checksum
e8614d09b7be: Download complete
e8614d09b7be: Pull complete
53ccb90bafd7: Verifying Checksum
53ccb90bafd7: Download complete
53ccb90bafd7: Pull complete
Digest: sha256:c9137505edaef138cc23479c73e46e9a3ef7ec6225b64789a03609c973b99030
Status: Downloaded newer image for dyrnq/kube-state-metrics:v2.3.0
docker.io/dyrnq/kube-state-metrics:v2.3.0
192.168.26.83 | CHANGED | rc=0 >>
v2.3.0: Pulling from dyrnq/kube-state-metrics
e8614d09b7be: Pulling fs layer
53ccb90bafd7: Pulling fs layer
e8614d09b7be: Verifying Checksum
e8614d09b7be: Download complete
e8614d09b7be: Pull complete
53ccb90bafd7: Verifying Checksum
53ccb90bafd7: Download complete
53ccb90bafd7: Pull complete
Digest: sha256:c9137505edaef138cc23479c73e46e9a3ef7ec6225b64789a03609c973b99030
Status: Downloaded newer image for dyrnq/kube-state-metrics:v2.3.0
docker.io/dyrnq/kube-state-metrics:v2.3.0

修改完之后,会发现所有的pod都创建成功

1
2
3
4
5
6
7
8
9
10
11
12
13
┌──[root@vms81.liruilongs.github.io]-[~/ansible/k8s-helm-create/kube-prometheus-stack]
└─$kubectl get pods
NAME READY STATUS RESTARTS AGE
alertmanager-liruilong-kube-prometheus-alertmanager-0 2/2 Running 0 61m
liruilong-grafana-5955564c75-zpbjq 3/3 Running 0 62m
liruilong-kube-prometheus-operator-5cb699b469-fbkw5 1/1 Running 0 62m
liruilong-kube-state-metrics-5dcf758c47-bbwt4 1/1 Running 7 (32m ago) 62m
liruilong-prometheus-node-exporter-rfsc5 1/1 Running 0 62m
liruilong-prometheus-node-exporter-vm7s9 1/1 Running 0 62m
liruilong-prometheus-node-exporter-z9j8b 1/1 Running 0 62m
prometheus-liruilong-kube-prometheus-prometheus-0 2/2 Running 0 61m
┌──[root@vms81.liruilongs.github.io]-[~/ansible/k8s-helm-create/kube-prometheus-stack]
└─$

然后我们需要修改liruilong-grafana SVC的类型为NodePort,这样,物理机就可以访问了

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
┌──[root@vms81.liruilongs.github.io]-[~/ansible/k8s-helm-create/kube-prometheus-stack/templates]
└─$kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 33m
liruilong-grafana ClusterIP 10.99.220.121 <none> 80/TCP 34m
liruilong-kube-prometheus-alertmanager ClusterIP 10.97.193.228 <none> 9093/TCP 34m
liruilong-kube-prometheus-operator ClusterIP 10.101.106.93 <none> 443/TCP 34m
liruilong-kube-prometheus-prometheus ClusterIP 10.105.176.19 <none> 9090/TCP 34m
liruilong-kube-state-metrics ClusterIP 10.98.94.55 <none> 8080/TCP 34m
liruilong-prometheus-node-exporter ClusterIP 10.110.216.215 <none> 9100/TCP 34m
prometheus-operated ClusterIP None <none> 9090/TCP 33m
┌──[root@vms81.liruilongs.github.io]-[~/ansible/k8s-helm-create/kube-prometheus-stack]
└─$kubectl edit svc liruilong-grafana
service/liruilong-grafana edited
┌──[root@vms81.liruilongs.github.io]-[~/ansible/k8s-helm-create/kube-prometheus-stack/templates]
└─$kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 35m
liruilong-grafana NodePort 10.99.220.121 <none> 80:30443/TCP 36m
liruilong-kube-prometheus-alertmanager ClusterIP 10.97.193.228 <none> 9093/TCP 36m
liruilong-kube-prometheus-operator ClusterIP 10.101.106.93 <none> 443/TCP 36m
liruilong-kube-prometheus-prometheus ClusterIP 10.105.176.19 <none> 9090/TCP 36m
liruilong-kube-state-metrics ClusterIP 10.98.94.55 <none> 8080/TCP 36m
liruilong-prometheus-node-exporter ClusterIP 10.110.216.215 <none> 9100/TCP 36m
prometheus-operated ClusterIP None <none> 9090/TCP 35m
物理机访问
在这里插入图片描述

通过secrets解密获取用户名密码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
┌──[root@vms81.liruilongs.github.io]-[~/ansible/k8s-helm-create/kube-prometheus-stack/templates]
└─$kubectl get secrets | grep grafana
liruilong-grafana Opaque 3 38m
liruilong-grafana-test-token-q8z8j kubernetes.io/service-account-token 3 38m
liruilong-grafana-token-j94p8 kubernetes.io/service-account-token 3 38m
┌──[root@vms81.liruilongs.github.io]-[~/ansible/k8s-helm-create/kube-prometheus-stack/templates]
└─$kubectl get secrets liruilong-grafana -o yaml
apiVersion: v1
data:
admin-password: cHJvbS1vcGVyYXRvcg==
admin-user: YWRtaW4=
ldap-toml: ""
kind: Secret
metadata:
annotations:
meta.helm.sh/release-name: liruilong
meta.helm.sh/release-namespace: monitoring
creationTimestamp: "2022-01-15T18:59:40Z"
labels:
app.kubernetes.io/instance: liruilong
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: grafana
app.kubernetes.io/version: 8.3.3
helm.sh/chart: grafana-6.20.5
name: liruilong-grafana
namespace: monitoring
resourceVersion: "1105663"
uid: c03ff5f3-deb5-458c-8583-787f41034469
type: Opaque
┌──[root@vms81.liruilongs.github.io]-[~/ansible/k8s-helm-create/kube-prometheus-stack/templates]
└─$kubectl get secrets liruilong-grafana -o jsonpath='{.data.admin-user}}'| base64 -d
adminbase64: 输入无效
┌──[root@vms81.liruilongs.github.io]-[~/ansible/k8s-helm-create/kube-prometheus-stack/templates]
└─$kubectl get secrets liruilong-grafana -o jsonpath='{.data.admin-password}}'| base64 -d
prom-operatorbase64: 输入无效

得到用户名密码:admin/prom-operator

正常登录,查看监控信息
在这里插入图片描述
在这里插入图片描述

关于镜像拉取不了的问题处理

镜像拉不了的问题,直接替换不好找,也可以把 charts 包下载下来,然后通过 helm template 转化为具体的 yaml 文件。替换对应的镜像。但是这样还一个问题,一些 CRD 不会预先安装,尤其是多 master 的情况,这里你可以多试几次,说不定就可以了,github上有人提了,貌似没有很好的解决方案,我的解决办法是先用 helm 安装,然后卸载,卸载的时候不会卸载 crd,然后在运行 生成的 yaml 文件。

发布于

2022-01-16

更新于

2023-06-21

许可协议

评论
Your browser is out-of-date!

Update your browser to view this website correctly.&npsb;Update my browser now

×